THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


Notes on Hotelling’s Generalized T. P. L. Hsu 
A Generalization of Markoff’s Inequality. A. Wap 
A Modification of Bayes’ Problem. R. von Mises 


On the Probability Theory of Arbitrarily Linked Events. Hiipa 
GEIRINGER 


Fiducial Distributions in Fiducial Inference. 


Biological Applications of Normal Range and Associated Sig- 
nificance Tests in Ignorance of Original Distribution Forms. 
Wiiuiam R. THOMPSON 


The Computation of Moments with the Use of Cumulative Totals. 
Pau. 8. DwYER 


A Note on the Derivation of Formulae for Multiple and Partial 
Correlation. Louis GuTTMAN 


Note on Regression Functions in the Case of Three Second-Order 
Random Variables. Ciyps A. BRIDGER 


Vol. IX, No. 4 — December, 1938 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
S. S. WILKS, Editor 


A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF 


H. C. Carver R. A. FisHER R. DE MIsEs 

H. Cramér T. C. Fry E. 8. Pearson 
W. E. DemInG H. Hore.iine H. L. Rrerz 

G. Darmois W. A. SHEWHART 


Manuscripts for publication in the ANNALS OF MATHEMATICAL ea 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts ~ 
should be typewritten double-spaced with wide margins, and the original copy | 
should be submitted. Footnotes should be reduced to a minimum and whenever ‘4 
possible replaced by a bibliography at the end of the paper; formulae in foot- . 
notes should be avoided. Figures, charts, and diagrams should be drawn on™ 
plain white paper or tracing cloth in black India ink twice the size they are to © 
be printed. Authors are requested to keep in mind typographical difficulties 7 
of complicated mathematical formulae. ¥ 


Authors will ordinarily receive only galley proofs. Fifty reprints without = 
covers will be furnished free. Additional reprints and covers furnished at cost. 7 


The subscription price for the ANNALS is $4.00 per year. Single copies $1.25. © ‘ 
Back numbers are available at the following rates: 


Vols. I-IV $5.00 each. Single numbers $1.50. 
Vols. V to date $4.00 each. Single numbers $1.25. 


Subscriptions, renewals, orders for back numbers and other business com- 7 
munications should be sent to A. T. Craig, University of Iowa, Iowa City, Towa. 


The ANNALS OF MATHEMATICAL STATISTICS is published quarterly by the | 
Institute of Mathematical Statistics. q 


COMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
BALTimoreB, Mp., U.S. A. 











_— 


NOTES ON HOTELLING’S GENERALIZED T 
By P. L. Hsu 


1. Frequency Distribution When the Hypothesis Tested is Not True 


a. THE Prospiem. Let the simultaneous elementary probability law of the 
k(f + 1) variables z; and zir (i = 1,2,---, ks = 1,2,---,f) be 


k 


Q) ple) = (VIR |C HY exp| —}D calles — 2G — 6) +0 |, 


i7= 


where 


f 

ie Du irr (i,j= 1,2, -++,k) 
C stands for the matrix || c;; || and | C | , the corresponding determinant. 
required to find the elementary probability law of the statistic 


k 
7 1 rp i—l xf Baa 
T= |V'\" De Vijaiz;, 


t,j=1 


where | V’ | = | v;; | and V;; denotes the cofactor of the element »; ; in the matrix 
I| vi; || - 

The quantity f7' is a generalization of ‘‘Student’s” ¢ considered by Hotelling 
{1]*. It isan appropriate criterion to test the hypothesis, say Ho , that the ¢; in 
the parent population as given by (1) all vanish. The distribution of 7 when 
the hypothesis Ho is true has already been obtained by Hotelling. But our 
knowledge of the test is hardly complete unless we know also the distribution of 
T when the ¢; do not all vanish. Indeed, only such a knowledge can enable us to 
control the risk of error of the second kind, i.e. of failure to detect that Ho is 
untrue [3, 4]. 


b. THESotvutTion. LetH beak Xk non-singular matrix such that H’CH = T/, 
the unit matrix, where H’ denotes the transposed matrix of H. Let the sets of 
variables (21, 22, ---, 2) and (21,, Zar, +++, Zer)(r = 1, 2,---f) be subject 
to the same collineation by means of H, so that 


[|21, 22, °°: c = ||t, fe, +++, te||-H’ 


|| ir» Zar, sada || tie, far , +++, ter ||-H’ (r = 1,2,---,f) 


where the ¢; and ¢;, are the new variables. Let further the quantities 7; be de- 
fined by 


* References are given at the end of the paper. 
231 





232 P. L. HSU 


Git ei 1] 1 ’ 
(2) Ila, Se, -°°, Sell = |], 72, °°°, Tel]. 


Then, as is easy to verify, the simultaneous distribution of the new variables 
will be given by 


-_  —=he +1) 1 ‘ 2 / 
(3) pill, t’) = (/2r) mn exp |-3 ee {(t; — ri)” + wa], 
i=1 
while the statistic 7’, as a function of the ?’s, retains the original form: 


k 
(4) T=|Ul" Do Vistit; 


i,j=1 


where 
f 
Us —_ z tint je i,j= 1, 2, --+,k), 
r=1 


ira , ot : Pm i a 
|U’| = | ui; |, and U;; is the cofactor of the element wu; ; in the matrix || u;; ||. 
By virtue of (2) we have the following relation between the old and new para- 
metric constants: 


k k 
(5) ett = Dy cy hit;- 
i=l i,j=1 


Our problem is thus reduced to finding the derived distribution of T defined by . 
(4) from the parent population given by (3). 

We solve this problem by obtaining an expression for the Laplace integral 
E(e~®’), i.e. the mathematical expectation of e °’ for real non-negative B. A few 
words are perhaps needed to explain the fact that the Laplace transform of an 
elementary probability law determines the latter uniquely except on a null set 
of points. If f(a) is an elementary probability law which vanishes for all nega- 
tive x and if 


oa = [ e*f(z)dx for B>0, 


then, letting c be any fixed positive constant, we have 


g(c — B) = | ef f(x) dx 
0 


forall8 <c. We get therefore 


a 
dp" 
the definite integral being obviously finite for all h > 0. Now a sufficient con- 
dition that the set. of numbers m, determines the function e “f(x) uniquely, 
with the exception of a null set at most, is that the latter multiplied by e*¥* 
be summable (0, a) for some positive k (ef. [6], p. 320). Since this condition is 
trivially satisfied by the function e “f(x), this function, and therefore f(x) itself, 


m, = [ ae f(x) dx = gic —B)| , (h=0,1,2,---) 
0 B=0 





NOTES ON HOTELLING’S GENERALIZED T 233 


must be uniquely determined by the m,. In other words, f(x) is uniquely 
determined by its Laplace transform g(8). We now proceed to find the Laplace 
integral E(e™*’). 

Introduce the function 


es 7 1 k , oll 
git, ‘. 6, a) = (4/2r)* | U |° exp [-34 U;;0:0; + Zia 2d wo. 


i,j=1 
and write 
F(t, ’, 6, a) = prlt, t’)g(t, ’, 6, a), 


where all the arguments take real values only. For any functions ¢(@) and 
y(t, t’) let us write 


| -@ wo [oo [Ee aos + ao 


[vu t’) d(t, t’) -| | W(t, t’) dty +++ dt, dt, -++ dtiy. 


We have 


| d(t, t’) | | F(t, t’, 0, a) |d@ = / prt, t’) d(t, t’) / g(t, t’, 6,0) de = 1 


whence we know that 


6 [ae [ram fa f raqe 


On the right-hand side of (6) we find 


| pilt, t’) dé, ¢’) | g(t, t’, 8, a) do = | e*"Tn(t, ') dt, ’) = Ee”) 


while for the integral on the right-hand side of (6) we have 


/ F dt, t’) 


k k 
= (/2n) 0 exp (— Zz )/ exp | - =o {ti + 2(iad; — “| dt 
i=l 


ad | 
' 1< ' 
x fi U’ |* exp | - 5 > (6:0; + 5:) ui ae, 
i,j=1 
where we mean by the 6;; the quantities 
bi; = 0 fori ~7 
(i,j = 1, 2, oes , k) 
Ou = | 


In the equation (7) the integral with respect to the ¢; is immediately written 
down as 





Pp. L. HSU 


k 
(4/27)* exp 3 2. (7, - at.) | 
i=1 


As to the integral with respect to the ¢;, , we may evaluate it by the method by 
which Wilks [7] evaluated the moments of the generalized variance. The 
result is . 
“ rr . --cabilia r3 1 
2 (4/2)! | 0:0; + 45; | 70" af + 
rg(f+1—k)) 


Making the substitution into (7) we get, after necessary reductions, 


J ee ary | 8:0; + 84 [89 


k 
xX exp |-2 {1063 + ir 
i=l 
k 
whence, noticing that | 6;0; + 6;;| = 1+ >» 6; 


1 


; 


(8) Me) = TAGE T= 8) + 26% 


. 
exp |-2 {10797 + ier} | 
i=] 


Equation (8) gives the Laplace transform of the elementary probability law, 
p(T), of T. There is no essential difficulty in getting p(7’) by inversion directly 
from (8). Nevertheless, it may be of interest to get p(7’) indirectly by identi- 
fying the right-hand side of (8) with the Laplace transform of another ele- 
mentary probability law which is otherwise known. For this purpose consider 
the simultaneous elementary probability law 


nea 7 f2 ‘ 
plz, u) = (V2e)- exp | -3 Ga) - 3 Yat | 


and let us find the derived distribution of the statistic 


fi f2 
L=) a [> MF 
i=l j 


7=1 


i= 


As before, we introduce the function 


—_ fe Sf) 1 f2 . fi ‘ fi 
g(a, y, 0, a) = (/2n) be i) exp | -3(2 yi 2 0; + 2ia DY 0) | 
j=1 j=1 i=1 i=1 
write 
F(x, y, 8, a) = p(x, y)g(x, y, 9, a) 


and ascertain that 






















NOTES ON HOTELLING’S GENERALIZED T 235 


(9) [ a, y) [ rao = [a0 | Pac y) 


On the left-hand side of (9) we find 


| e'** n(x, y) d(x, y) = E(e**"") 


while for the integral on the right-hand side of (9), we have 
fi 
fF d(x, y) i (+/2x)° Ot exp (-5 a :) 
i=1 
4 ii 
x | exp | -3 2. foi + (iad; — gas [ax 
i=1 


F f2 2 1 1 fi : 2 ‘s 
< (0) o[-0 + He) E vo 
ian _ i= j= 


_ (vr) Tah ne fi 
- PG) 2 


i=1 


fi 
exp | -3 dX (a’6; + Ziatse) | 
Writing 
(10) fi = k, fe=ft+i1—k 
we get finally 
1 bath (7x) “TAY + 1) ( i - 
Be) = “Tag+i-p) J+ oe 
(11) : 
exp | -} 5 (a6; + Ziakia) | 


From the identity of (8) and (11) we conclude that T is distributed exactly 
the same as L with the appropriate ‘‘degrees of freedom” f; and fe given by (10). 
But the elementary probability law of L has already been derived by P. C. 
Tang [5]. Using his result we immediately write down the elementary proba- 
bility law of T: 


1 
B(afi + |, : 4 fe) 


where f; and fe are given by (10) and 


(12) p(T) =e& vy i = TMA 4 py hrtea)- 
Is 3 l< 
(13) Xr = = J Ti =: Ae Ci S45; 


in accordance with (5). The tables of probability integrals prepared by Tang 
can, of course, be used to suit our purpose. 


236 P. L. HSU 


2. An Optimum Property of the T-Test. To any reader familiar with the 
Neyman-Pearson theory of testing statistical hypotheses [3, 4], the theorem 
stated below may be of considerable interest. 

Denote by W the k(f + 1)-dimensional space of the z; and z;, and let w be any 
region in W which may possibly serve as a critical region for the rejection of the 
hypothesis Hy. Let us speak of a critical region w as belonging’ to the class D 
if w satisfies the following condition: 


hk 
(14) [ pz, 2’)d(z, 2’) =e + : egg; +R 


i,j= 


where ¢ < 1 is a positive constant independent of the ¢; , c;; and the region w, 
a is a constant depending on w only, but not on the ¢; or c;; , and where R for 
any given set of values of the c;; is an infinitesimal of at least the third order 
as all the ¢; tend to zero. 

THEOREM. Of all the regions belonging to the class D, the particular region 
which gives the largest possible value to the coefficient a in the equation (14) is the 
region defined by T > T., where T, is a constant so determined that the probability, 
when all &; vanish, of the observed T being not less than T, 1s exactly e. 

The significance of the theorem is clear. Every critical region belonging to 
the class D serves as an unbiased exact test of the hypothesis Hy , e being the 
preassigned chance of rejecting Hp if it is true. Further, as is seen from (14), 
as the ¢; start to depart from zero, the increased chance of rejecting Hy due to its 
falsehood is approximately proportional to the quantity ~c,;¢:¢;. The co- 
efficient a therefore measures the power of the critical region w to detect the 
falsehood of Hy , at least when the departure of the ¢; from zero is small. Our 
theorem asserts that in this particular sense the T’-test is the most powerful of its 
kind. 

The method of proof is very much the same as that by which Neyman and 
Pearson proved some of their general theorems concerning unbiased tests. How- 
ever, as the present theorem has not yet been contained in their more general 
results, we shall give it a full proof without referring, save in one occasion, to 
these authors. 

Proor. Write 


Vi j + ij = Si; (7, j oe 1, 2, °°: , k) 


‘ 
(15) po(z, 2’) = arr a |-3 Cij a 

t,j=1 
and denote by po(s) the simultaneous elementary probability law of the variables 
s;; derived from (15). Let W; be the domain of all possible positions of the point 
(Si, S12, --+ , See) In the $k(k + 1)-dimensional space. 

We know, although we omit the proof of it, that there is no elementary proba- 
bility law of the variables s;; other than po(s) which has the same moments of 
all orders as those derived from po(z, 2’). It then follows that if g(s) be any 
summable function of the s;; and if 











NOTES ON HOTELLING’S GENERALIZED T 


(16) I (I si) g(s)po(s) ds = 0 


i,j=1 


for all positive integers 7;; or zero, then we must have g(s) = 0 except perhaps 
on a null set of points. 
It follows therefore that the identity 


(17) |, oso) as = 0 


implies the identity g(s) = 0 provided g(s) does not involve the parameters c;; 
For, substituting for po(s) its expression as given by Wishart [8] we shall have 


. 
(18) K [ q(s)po(s) ds = I g(s) |S |t4- exp | -} z cu | = 0 
Wi Wy i,j=1 


where | S| = | s;;| and K is some constant. Differentiating (18) successively 
with respect to the c;; and dividing the results by K, we shall regain the equations 
(16). Hence it follows that g(s) = 0. 


This being established, let w be any region belonging to D and rewrite the 
equation (14), so that 


(4/ dg) HUD CheeD [ exp | -} > 
(19) ‘ ia 


k 


| Cis | (2 aan i) (2; a re) + oi) Jat, z’) 


‘ 
=e+ ; De ciskis +R 


Setting all the ¢; to zero in both sides of (19), we have 


(20) [ po(z, 2’) d(z, 2’) =e 


identically in the c;;. Differentiating (19) once with respect to ¢; and after- 
wards setting all the ¢; to zero, we easily get 


(21) J 2: po(z, 2’) d(z, 2’) =0 (i = 1,2, ---k) 


for all possible values of the c;; . 


Finally, differentiating (19) with respect to ¢; and then to ¢; and putting all 
¢; = 0 in the result we obtain 


k k 
I ( Cih =.)( Cih :) —_ ca} poz, z’) d(z, 2’) = Aci; (2, j = 1, 2; Ae k) 
w h=1 h=1 


whence, renumbering (20) 


i 


CirCiqda = Bc; (i,j _ 1, 2, oa k) 
h,l=1 


’ 


(22) 


in which we denote by 8 = a + e and 





Pr. iL. BSU 


qu = / Zr zi pol2, 2’) d(z, 2’) (h,l = 1, 2, ---,k) 


If we denote by Q the matrix of order k formed of the elements qn; , we see that 
(22) may be written as 


CQC = BC, 
. . . . —] 
whence, since C has its inverse matrix, C, 


Q = pC" 


1.€., 
(23) Vi = Bes;” (2,9 as ay 2, — k) 


where cs; denotes the element in the matrix C”' which corresponds to the 
element c;; in the matrix C. 

Conditions (20), (21) and (23) are necessary for the region w to belong to the 
class D. They are evidently also sufficient. 

Let us evaluate the integrals in (20), (21) and the g;; by first evaluating the 
surface integrals on any surface, say G(s), on which all the s;; have constant 
values, and then integrating the results with respect to the s;; over a region, 
say w,, of the s;; contained in W,. Thus we may write (20), (21) and (23) in 
the form 


(24) [ S(s)po(s) ds = e, I gi(s)po(s) ds = 0, [ gij(s)po(s) = Bey; ’”, 


(i,j = 1,2, ---,B), 
where 


1 
s) = - o(z, 2’) dG(s 
fio) = 1 | mle, acts 


ie 
Po(s) Jars) 


1 Y 
gi(s) = ods) [. 242; po(z, 2’) dG(s) 
0 G(s 


It is readily verified that the function po(z, 2’)/po(s) is free from the parameters 
c;;, and consequently so are the functions f(s), gi(s), gi;(s). Besides, we can 
extend the definition of these functions in the whole domain W, by assigning 
them the value zero outside of the region w,. Doing this we can now write the 
equations (24) as 


[ (f(s) — €)po(s) ds = 0, [ gi(s)po(s) ds = 0, 


1 


gi(s) = 21 po(2, 2’) dG(s) 


(25) 
/ Ie) = Vsij] po(s) ds = 0 (i, 9 m 1,3, -**, k) 
Wi 





NOTES ON HOTELLING’S GENERALIZED T 239 


Now all the equations (25) are of the form (17); consequently, according to 
the already established result and remembering the definitions of the functions 
f(s), g:i(s) and ¢;;(s), we must have 


(26) [ polz, z’) dG(s) = €po(s) 
G(s) 


(27) [ zi po(z, 2’) dG(s) = 0 
G(s) 


(28) [. 2:2; po(z, 2’) dG(s) = si; po(s) 
G(s) 


in the whole domain W, . 

Hence the most general region belonging to the class D is constructed as 
follows. On any surface s;; = const. (7,7 = 1, 2, --- k) we take an areal region 
such that it satisfies the equations (26)—(28); we then allow the s;; to vary in the 
whole domain W,. Equations (28) may now be replaced by 


we “Ze 
(28’) I. (2 a 7 ‘) po(z, z’) _ 0, (i,j = 1, 2, vom 7) 
G(s) 


811 8ij 


Let us call wo the region defined by 7 > T,. Since wo belongs to the class 
D (ef. (12)), its cross section, say Go(s), by any surface s;; = const. (i, 7 = 1, 2, 
‘ ; ie . ‘ 1 
--- ,k) must satisfy the equations (26), (27) and (28’). Sincey = f+i (a+ e), 
all we have to prove now is that among all the areal regions G(s) satisfying the 
equations (26), (27) and (28’) it is the region Go(s) that gives the largest possible 
value to ypo(s). Now 


2 
(29) ypo(s) = [. = po(2, 2’) d(z, 2’) 
G(s) 


$1 

and, according to a Lemma of Neyman and Pearson, [3, p. 10] the right-hand 
side of (29) will attain its maximum value if G(s) is defined by an inequality 
of the form 

2 k 2 " k 
(30) — > aij (2: - #1) + Do biz +e 

su i,j=1 $11 8ij i=1 
where the a;; , b; and c are constants so determined as to enable the region G(s) 
to satisfy the equations (26)—(28). We shall show presently that the region 
G(s) is defined by such an inequality. 

The inequality 7 > 7’, may be written as 


lol 
| vig + 2:2; | 


| Sig — 2:2; | 


| 8 | 





240 


zs 
31 a 1) 242; € 
( ) yo 7 =. = 4 + , 


where s;;’ denotes the (i, j)th element in the inverse matrix of | si; {|. The 
region (o(s) is therefore defined by the same inequality (31) in which we regard 
the s;; as constants. 

If we put 


~ kIi+T. 


in (30) we can easily reduce the inequality (30) into (31). 
The proof is now complete. 


(—1) 


ay = SijSij b; = 0, 


k (¢,j = 1,2, --- ,k) 


3. Note on Applications of T. It is already known that the 7-test may be 
used for the following purposes (a) and (b): 
(a) Given a k-variate normal surface 


saat naa 1 = 
p(x) = (»/2r) “C’ exp | - 5 x ei(ai — &)(2; — é | 
ij= 
with the unknown £; and c;;. n observations 


(Xiz, Yor, +++ » Ler), (J = 1, 2,---,n) 
having been made, it is required to test the hypothesis that the £; have the par- 
ticular values £; for 7 = 1, 2, --- , k. 

Here we use the 7'-test with 


m 


=Vnlii-%), v5 = d (x1 — &i)(Zjn — 2) | 
= Wn —&), f=on-1 


where 


(b) Given two k-variate normal surfaces 


p(x) = (+/2n) *C? exp (—} ¥ ci(ai — &)(aj — ‘)) 


pr(x) = (x/2n)*C? exp (—} ; i cilys — mys — n)) 


where the c;; are common to the two surfaces while all the &;, &;, c;; are un- 
known. Samples of n; and ne having been drawn respectively from the two 
populations, to test the hypothesis that &; = 7; for all 7. 


Let the samples be 





NOTES ON HOTELLING’S GENERALIZED T 


(a1 9 M85 *** y Lei), 


(yur > emg? 5 Ykn); 


1 n 
Ne 


i 4 Yin t= +++, k) 


We use the T-test with 


na 4/2 (4 a Ii), =2 (xin —Z i) (xj — &;) + > (Yin ‘saith Gi) (Yin ee jj) 


ue NyN2 Pinas oval me me 
_ Sai se ee fetes 
(i,j = 1,2,---,k) 


A third application of 7, which appears to be novel, is the following: 
(c) Given a (k + 1)-variate normal surface 


p(x) = (»/2r)“*? D' exp |-3 = di(ai — &:) (aj — «| , D = |d;;\, 


where the £; and d;; are all unknown. 1 observations 
(ir, Ver, ++ » Leys) (l = 1, 2,---,n) 


having been made, to test the hypothesis that all the £; are equal. 
If we put 


Fi He — Tee (@@ = 1,2,. 


then we have a k-variate normal surface for the variables y; . 


ply) = (Ve) *C! exp| — 5D cul — 0) | 


where 9; = & — &&s1 (¢ = 1,2, --- ,k). Thus the problem is reduced to testing 
the hypothesis that n; = 0 for 7 = 1, 2,---,k and therefore belongs to the 
type (a). Write 


Yi = Lit — WMKr,1 


and 


We use the 7-test with 





Pp. L. BSU 


24= Vnii, v4; = > (yu ~ (yn — 9) | 7 
4 Hada a 


f= Vn, f=n+1 


Although there are no simple expressions for the c;; , there is one for the param- 
eter 2 cijnin; , on which alone the distribution of 7’ depends. We have indeed 


k 
441 °° Oka, ket = Eky 
2. Ciyining = 5 


i,j=1 
ela Fray 0 


=~ | Okga1 °° * Ok41, k41 


ee 1 


where o;; is the covariance between x; and 2; . 
Expressing 7 in terms of the original variables z, we have 


$11 S12° °°) |= 81, k41 


* Skqikq1 Ceq 


Tra 0 


* Sk41, k+1 
1 


and where 
lx sta 
= fits OG = tae 48 (j= 1,2,---,k+1) 


Therefore 7 is independent of which variable has been taken as the (k + 1)st 
UNIVERSITY COLLEGE, LONDON. 
REFERENCES 
[1] H. Horetiine, Ann. Math. Statist., Vol. 2 (1931) pp. 359-378. 





NOTES ON HOTELLING’S GENERALIZED T 243 


[2] J. Neyvman, Bull. Soc. Math. France, Vol. 63 (1935) pp. 246-266. 

[3] J. NEYMAN AND E. S. Pearson, Statist. Res. Mem., Vol. 1 (1936) pp. 1-37. 

[4] J. NEYMAN AND E. S. Pearson, Statist. Res. Mem., Vol. 2 (1938). 

[5] P. C. Tana, Statist. Res. Mem., Vol. 2 (1938). 

[6] E. C. Trrcumarsn, Introduction to the Theory of Fourier Integrals. Oxford Univ. Press 
(1937). 

[7] S. S. Winks, Biometrika, Vol. 24 (1932) pp. 471-494. 

[8] J. WisHart, Biometrika, Vol. 20A (1928) pp. 32-52. 





GENERALIZATION OF THE INEQUALITY OF MARKOFF 
By A. WaLp 


1. Introduction. Denote by X a random variable and by M, the expected 
value E | X — 2 |" of | X — a»|' for any integer r where 2» denotes a given 
real value. M, is also called the absolute moment of order r about the point ap . 
For any positive number d, denote by P(—d < X — a < d) the probability 


that | X — x| <d. The inequality of Markoff can be written as follows 

M, 

dt 

The inequality (1) is also called, for r = 2, the inequality of Tchebyscheff. 
The inequality (1) can be written in the following way: 


(1) MN-d<X-~a <@ > 1~- 


P(—tx/M, < X — am < §¥/M,) >1- 
VM, 


Substituting in the above inequality s for r and — Y= 3 for — we get 


8 


4 8 


B/M, < X 56/E 1 (MY 
(2) P(—EV/M, < X —m < EW/M,) >1-— ( my, 
EB \\/M, 


where r and s denote any integers and — denotes an arbitrary positive value.’ 
Substituting in (2) 2k for s and 2 for r, we get the inequality of K. Pearson.” 
By other substitutions we get the formulae of Lurquin, Cantelli, ete.” 

As is well known, the inequality (1) cannot be improved‘ for d > x/M,. 
That is to say, to every « > 0 a random variable Y can be given such that 

E|\Y¥—m|’=E|X—a%\| and P(-d<Y—-—a<d)<1- arte 

If the absolute moments M;, = E(| X — xo |"'), --- M;,=EF|X — Xo | "i of 
a random variable X are given (and no further data about X are known), then 
we shall say that ag is the “sharp” lower limit of P(—d < X — x < d) if the 
following two conditions are fulfilled: 

(1) For each random variable Y, for which E | Y — xo |'' = E| X — a2|",---, 
E|Y —a|'i = E| X — x |", the inequality P(—d < Y — a < d) > ag holds. 


1 The formula (2) has been given by A. Guldberg, Comptes Rendus, Paris, Vol. 175, p. 679. 

2 Biometrika, Vol. XII (1918-1919) pp. 284-296. 

3 E. Lurquin, Comptes Rendus, Paris, Vol. 175, p. 681. Also Cantelli, Rendinconti delle 
Reale Academia dei Lincei, 1916. 

4 See for instance, R. v. Mises, Wahrscheinlichkeitsrechnung, Leipzig, Vienna, Deuticke, 
1931, p. 36. 


244 






















GENERALIZATION OF INEQUALITY OF MARKOFF 





(2) Toeach e > 0,a random variable Y can be given such that # | Y — ao |"’ 
E| X— 2x|" (ve =1,--- ,j) and P(—d < Y—~m<d)<a+e. 

In other words, ag is the limes inferior’ of the probabilities P(—d < 
Y — a < d) formed for all random variables Y for which the 7,-th absolute 
moment about the point 2 is equal to the 7,-th moment of X about the point 
aw (v = 1,---,9). 


PROBLEM: The absolute moments M;,, Mi, ,--- ,Mi; of a random variable 
X are given about the point xo , where t,, i2, «++ , 1; denote any integers and M;, 
denotes the moment of order i, (v = 1 --- k). It ts required to calculate the “‘sharp”’ 


lower limit of the probability P(—d < X — a < d) for any positive value d. 

If only a single moment M, is given, our problem is already solved, because 
the inequality (1) gives us the “sharp” lower limit for d > ~+/M, and for 
d < ¥/M, the “sharp” limit is obviously equal to zero. But the case in which 
even two moments M, and M, are given has not yet been solved. The formula 
(2) gives us a limit for P(—d < X — a < d), but this limit is not “sharp,” 
as can easily be shown. 

We shall give here some results concerning the general case, and the com- 
plete solution if only two moments M, and M, are given. We shall see that 
the “sharp” limit is considerably greater than the limit given by (2). 


2. Some Propositions Concerning the General Case. We shall call a random 
variable X non-negative if P(X < 0) = 0. Since the absolute moments of 
the non-negative random variable Y = | X — 2o| about the origin are equal 
to the absolute moments of X about the point 2 and since P(Y < d) = 
P(—d < X — a < d), the following proposition holds true: 

(1) Denote by M;,,---,M;, the absolute moments of order i,, +--+ ,%; of a 
certain random variable X about the point x. The limes inferior of the proba- 
bilities P(—d < Y — xo < d) is equal to the limes inferior of the probabilities 
P(Z < d), where P(—d < Y — a < d) is formed for all random variables Y 
for which the 1,-th absolute moment about x9 is equal to M;, (v = 1, --- ,j), and 
P(Z < d) is formed for all non-negative random variables Z for which the i,-th 
moment about the origin is equal to M;, (v = 1, --- , 9). 

On account of the proposition (I) we can restrict ourselves to the considera- 
tion of non-negative random variables and of the moments about the origin. 
A random variable X for which k different values 2,,--- , 2% exist such 


k 
that the probability p(x;) of x; (¢ = 1, --- , k) is positive and Zz p(x:i) = 1, 
i=1 
is called an arithmetic random variable of degree k. A random variable X 
will be called t-limited, if P(—t < X < t) = 1 We shall prove the following 
proposition. 
(II). Let us denote by M;,, Mi,,---, Mi, the absolute moments of order 
1, -++ ,2; of a certain non-negative random variable X, about the origin. Denote 
by Q(k, t) the set of all non-negative t-limited arithmetic random variables of 


5 The limes inferior of a set N of numbers is the greatest value y for which the inequality 
y < x for each element x of N holds true. This is also called greatest lower bound. 











246 A. WALD 


degree < k, for which the i,th moment about the origin is equal to M;, (v = 1, --- ,j). 
Q(k, t) is supposed to be not empty. Denote further by a(d, k, t) the limes inferior 
of the probabilities P(Y < d) formed for all random variables Y of the set Q(k, t). 
Then we can say: There exists in Q(k, t) a random variable Z for which P(Z < d) = 
a(d,k,t). If 0 < a(d,k,t) < 1 and Z isa random variable in Q(k, t) for which 
P(Z < d) = a(d, k, t), then there exist at most 7 — 1 different positive values 
X11, +--+, Xj; such that x; ¥ d, x; # t and the probability p(x;) of x; is positive 
(@ = 1,2,---,j — 1). 

At first we shall prove that there exists a random variable Z in Q(k, t) such 
that P(Z < d) = a(d,k, t). Since a(d, k, t) is the limes inferior of P(Y < d) 
formed for all random variables Y in Q(k, t), there exists in Q(k, t) a sequence 
{Zi} (¢ = 1, 2, --- , ad inf.) of random variables, such that lim P(Z; < d) = 


i= 
a(d, k, t). Arranged in ascending order of magnitude, the values of Z; which 
have a positive probability are denoted by 2:1, i2,---,2%ix,. Sinee Z;isa 


t-limited non-negative arithmetic random variable of degree < k, we have 
ki; < kand0 <2;, <t(r = 1,---,k). It follows easily from this fact that 
there exists a subsequence {|Z;,} (v = 1, 2,---, ad inf.) of {Z;} with two 
properties: First, that the variables Z; 


1» Zig, +++ are of the same degree 
(say s), that is to say k;, = s (v = 1, 2,--- , ad inf.); and second, that the 
sequence {2:,,} (v = 1, 2,---, ad inf.) converges for each integer r < s. 
Let us denote lim z;,,, by z, (r = 1, 2, --- ,s), and the probability that Z;, 
takes the value z;,,, by pi,,. It is obvious that there exists a subsequence 
{Zn,} (v = 1, 2, --- ad inf.) of {Z;,} such that the sequence {p,,,,} converges 
with increasing v. Let us denote lim p,,,, by p,, (r = 1, 2,--- 8s). Sinee 


Pipi t+: + Di, = 1, > p = 1 must hold true. We consider the random 
r=1 


variable Z for which the probability that Z = x, is equal to p, (r = 1, 2, --- s) 
and for which no values except 71, --- , 2, are possible. The random variable 
Z is obviously an element of Q(k, t) and P(Z < d) = a(d, k, ?). 

Let us consider the case in which 0 < a(d, k, t) < 1 and denote by Z a random 
variable of Q(k, t) for which P(Z < d) = a(d,k, t). We shall prove that there 
exist at most 7 — 1 different positive values x,, --- ,2;-1 such that 2; ¥ d, 
x; ~ t and the probability p(x;) of 2; is positive (¢ = 1, 2,---,7 — 1). In 
order to prove this statement we shall suppose that there exist 7 different posi- 
tive points 2, --- , x; such that 2,4 d,2; ~ tand p(z;) > 0 (i = 1, 2, --- , 9). 
Then we can write 


> «i' p(a,) = Mi, — D 2" p(2) 


eee eee eee eee eee eee e eee reese 


6 This is certainly the case, if we choose k and ¢ great enough. 








GENERALIZATION OF INEQUALITY OF MARKOFF 247 


where the summations on the right hand sides are to be taken over all values 
of x which are different from x, , --- , 2; and for which p(x) > 0. 

Since P(Z < d) = a(d, k, t) and 0 < a(d, k, t) < 1 by hypothesis, there 
exist two non-negative values b and c, such that b < d, c > d, p(b) > 0, and 
p(c) > 0. 

We define a new arithmetic random variable Z’ as follows: p’(b) = p(b) — «, 
p'(c) = p(e) + «, and for all other values p’(x) = p(x), where p’(x) denotes 
the probability that Z’ = x, and ¢ a positive number less than p(b). Z’ is 
obviously a non-negative arithmetic variable of the same degree as Z. The 
moments about the origin of the order 7, 72, --- , 7; of Z’ will in general not 
be equal to the corresponding moments of Z. However this can be obtained 
by a small displacement of the points 7, , --- , x; into a system of neighboring 
points Z, --- ,%;, provided that ¢ is small enough. In order to show this, 
we have only to prove that the functional determinant 


+ =t;—1 + =t;)—1 
Uy 9 °° UX; | 


i +: adie | 
A = | Fi", -**, 93; p’(a1) «++ p’(x;) 


Shed COB COCO EHO 


7. mtigl ‘ weg 
| 7X1 g 205 Say 


i 


of the functions f,(% , --- , %;) = > Ep! (a,), --- ,S (di, +--+ €) = 7 #'p' (2,) 
v=1 


v=] 
does not vanish at the point 7 = ™,---,#; = 2;. Since-p’(x), p’(a), --- 
p’(x;) are not equal to zero, we have only to show that 
se, J 
t,—1 4;-—1 
Uy ’ ’ Xj . 7 
* oe “ee a) | 1,-1 i;-1 
A s= | eeceerereeveeeeece = ‘1 ’ , ‘] b] vy “ee vj gt 0 
$;—1 tj—-1 | | we we wee eer eee vee 
vy ’ » Uj : ; 
ati a1 tj-?1 
V1 ) x; 
where 7 — 7,,---,2; — % can be assumed positive by denoting by 7 the 
smallest of the integers 2) , 72, +--+ ,2;. 


Let us consider the polynomial in x given by 


|1,---,1, 1 
| ta—ty Pg Ec | 
R(x) aa (ee ; v; » a 
tj—-1 ijt] tj;—t} 
ti’ ' » Bir e* 


According to a well-known algebraic proposition, the number of positive roots 
of R(x) is less than or equal to the number of changes of sign in the sequence of 
coefficients of R(x). Since the number of changes of sign in R(x) is obviously 
less than or equal to 7 — 1, the number of positive roots of R(x) is also less 
than or equal toj7 — 1. On the other hand x = 2%, ---,2 = aj. arej — 1 


positive roots of R(x). Hence for any positive value « ~*~ 1, ~ %,---, 





248 A. WALD 


~ x;-, the polynomial R(x) does not vanish. Thus R(z;) and therefore also 
A* and A are not equal to zero. 

Let us denote by Z* the random variable which we get from Z’ by a small 
displacement of the points 7, ---,2; into a system of neighboring points 
%,---,%;, such that the moment of order 7, of Z* about the origin becomes 
equal to M;, (v = 1, 2,---, 7). By choosing ¢ small enough we can obtain 
the values %,---,%;asneartoz,,---,2;as we like. In particular, € can be 
chosen so small that 7, --- ,.%; are positive numbers less than ¢, and %; > d 
or < d accordingly as x; > or < d. Then Z* is obviously an element of 


Q(k, t). But for Z* 
P(Z* < d) = P(Z@’ < d) = P(Z < d) —e = a(d,k, t) — « 


holds true, which is a contradiction because a(d, k, t) is the limes inferior of 
P(Y < d) formed for all random variables Y contained in Q(k, t). Hence our 
assumption that there exist 7 different positive numbers x,, --- , x; , for which 
xi Ad,x; A tand p(xi) > OG = 1, 2, --- , 7), cannot be true, and the propo- 
sition II is proved in all its parts. 

It follows from the proposition II that a(d, k, t) is independent of k. On 
account of this fact and of the fact that any random variable X can be arbi- 
trarily well approximated by arithmetic random variables, we get the prop- 
osition: 

Ill. Let us denote by M,,,---,M;, the moments about the origin of order 
41.---,t; of a certain non-negative random variable. Denote by Q(t) the set 
of all non-negative t-limited random variables, for which the i,-th moment about 
the origin is equal to M;, (v = 1, --- ,9). Denote further by a(d, t) the limes 
inferior of the probabilities P(Y < d) formed for all random variables Y con- 
tained in Q(t). Then we can say: There exists in Q(t) a random variable Z for 
which P(Z < d) = a(d,t). If 0 < a(d, t) < 1 and Z is a random variable for 
which P(Z < d) = a(d, t), then there exist at most j — 1 different positive numbers 
M1, -+-+ , X14, such that x; ¥ d, x; ¥ t, and the probability that Z = x; , is posi- 
tive (¢ = 1,2, --- ,j3 — 1) 

It is obvious that a(d, t) decreases monotonically with increasing ¢t. Hence 
lim a(d, t) exists and it can be easily shown that: 


t=00 


a(d, t) converges towards ag if t > ~. 


3. Solution of the Problem if Only Two Moments are Given. [et us denote 
by M, and M, the absolute moments respectively of order r and s about the 
point x of a certain random variable X, where r and s (r < s) denote any 
integers. 

Let us first consider the ease 


(a) 


It follows from (1) that 












GENERALIZATION OF INEQUALITY OF MARKOFF 






M, 


>I1-—- 
w= dr 




















We shall show that ag = 1 — = if - <1. For this purpose let us consider 
a’ a’ 
the arithmetic random variable Y, of degree 3 defined as follows: 
x, .* € d VY 
Ec / =o = -— - 
p(t + d) = — 5? p(x + d + b) s(; + ) 
p(x») =l|1- p(Xo + d) — p(x» +d + b) 

where ¢€ is a positive number and p(u) denotes the probability for Y, = u. 


The r-th moment about 2 of Y, is obviously equal to M,. On account of (a) 
the s-th moment of Y, about 2 is less than or equal to M, for b = 0. On the 
other hand the s-th moment of Y;, about 2 will be greater than M, if b is suffi- 
ciently large. Hence there exists a non-negative value bo such that the s-th 
moment of Y;, is equal to M,. 












sai ) M, € € d r M € 

Since P(—d < Y,, —% <d) = 1 —- — — <1-—- — ~ 

inc (—« ‘ d) - + 9 (3 :) 7 + 9 
and since e can be chosen arbitrarily small, we have 






M, 
dr - 


aq = 1] — 





. M, 


If - > 1, then ag is equal to zero, because a, decreases monotonically with 
( 





decreasing d and ag = Oford = V/M.,. 
We have now to consider the case 





@) M, . M. 


a” d 





First we shall show that 





M, 
(3) mM < i. 
d’ 


M, 


; sg de ; M,\. ~ M.. M, 
In fact, if : were > 1, then making use of (8) we have (4) > f f 
¢ r ( Tr 


and hence (M,)" > M,. But this is not possible, because according to the 







well-known inequalities between moments, (M,)" is less than or equal to M,. 
It follows from (3) and (8) that 









(4) = et. 
d* 






In order to calculate ag, we shall apply the propositions found in section 2. 
On account of the proposition I, ag is equal to the limes inferior of P(Y < d) 





250 A. WALD 


where P(Y < d) is formed for all non-negative random variables Y for which 
the 7-th moment about the origin is equal to M, and the s-th moment about the 
origin is equal to M,. Hence we can restrict ourselves to the consideration of 
non-negative random variables and of the moments about the origin. 

We shall show that 0 < a(d, t) holds for any positive value ¢. In order to 
prove this, it is sufficient to show that aa > 0 since a(d, t) > aa. It follows 


M, ; : M,. 
- . Since, according to (3), — « i, 





from the inequality (1) that aa > 1 — j 
¢ Tr 


we have aa > 0, and therefore also 


(5) a(d, t) > 0 


Let us see whether a(d, t) < 1. If M, = (M,)’, then, as is well-known, only 
a single non-negative random variable X exists for which the r-th moment 


about the origin is equal to M, and the s-th moment is equal to (M,)", namely 

the arithmetic random variable X of degree 1 for which the probability that 

X = V/M, is equal to 1. Since ¥/M, < d, as can be seen from (3), we have 
P(X < d) = 1, and therefore ag = 1. Hence in this case our problem is | 
already solved and we have to consider only the alternative: 


(6) M, = M; +0 (ce > 0) 


We shall show that a(d, t) < 1 fort > ¥/M, +d,. For this purpose let us 
consider the non-negative arithmetic random variable Y, of the degree 3 defined 
as follows: 











p(x/M,) = 1-—e, p(t) = = < _ <e€ 
Var M, 
p(0) = 1 — p(WM,) — p®) =e —e =," 


where p(u) denotes the probability for Y, = u, 
The 7-th moment of Y, is equal to 


M,p(v/M,) + tpt) = M,. 


The s-th moment of Y, is given by the expression 





and ¢ is a positive number <1. 


A = Mip(x/M,) + tp = 1 — OM? + 


t’ 





On account of (6), A is less than M, for e = 0. For e = 1 we have 


A=t'M,>d''M,. 






Since from (8) d°"M, > M,, we have A > M, fore = 1. Hence there exists 
a positive value « < 1 for which A = M,. Thus the 7r-th moment of Y,, is 
equal to M, and the s-th moment of Y,, is equal to M,. We have 





€0 



























GENERALIZATION OF INEQUALITY OF MARKOFF 251 


Pe <d) = pO) + WWM) =e +1 -e a1 eM <1. 


Hence 
(7) a(d,t) < 1. 


On account of (5) and (7) it follows from proposition III, that there exists a 
non-negative arithmetic random variable X belonging to the set Q(t) such that 
P(X < d) = add, t) and there exists at most one positive value 6(#d, #t) 
with positive probability. Hence a(d, t) is equal to the limes inferior of the 
probabilities P(Y < d) formed for all non-negative arithmetic random variables 
Y which have the following two properties: 

(1) The r-th moment about the origin is equal to M, and the s-th moment 

about the origin is equal to M, 

(2) There exists at most a single positive value 6(#d, #t) with positive 

probability. 


Denote by Z a non-negative ¢-limited random variable with the properties 
(1), (2), and for which P(Z < d) = a(d, t). The following equations hold 


p(0) + p(s) + p(d) + pit) 
(8) p(s)s + p(d)d’ + p(t)t’ = 
p(5)s + p(d)d* + p(t)’ = M, 


where p(u) denotes the probability that Z = uw. 
From the last two equations of (8), we get 


| oil 
al _— 
= 
3 


_ Ma’ — M, +p tt - ra 
(9) p(6) — "(ds — §*-") 
_- SV E var i t* 
(10) p(d) = Me =o "Me + vO IF | 


d*(ds- — §*-") 


i r. M, 5 a - 
Since . > and t > d, the numerator in (9) is positive. Since 0 < p(6) < 1, 








the inequality 

(11) 0<i<d 
must hold. Hence 

(12) p(s) > 0. 


We shall show that p(t) = 0 if ¢ is sufficiently large. For this purpose let 


us make the assumption p(t) > 0. We define a new random variable Z’ as 
follows: 


p'(t) = p(t) — e where 0 < « < p(t) 








A. WALD 







t's —F a , 


° d™(ds-* aa 6°") 


e(t®’ — t'd*’) 
6"(ds-" alee 6*-T) 


1 — p’(6) — p’(d)_p’(O) 





p'(d) = p(d) — 





p’(6) = p(s) — 


I 


















p’ (0) 


and 
p'(z) = 0 for all values z ¥ 0, 46, Ad, ¥t. 
p’(u) denotes the probability that Z’ = uw. 


The equations (8) remain satisfied if we substitute p’(0), p’(6), p’(d), and 
p’(t) for p(0), p(6), p(d), and p(t) respectively. Hence the r-th moment of Z’ 
is equal to M, and the s-th moment is equal to M,. We have to show that Z’ 
is in fact a random variable, that is to say, that the defined probabilities are 
>0 and <1. It is sufficient to show that the defined probabilities are non- 
negative, because the sum of them is equal to 1 and therefore they must be <1. 

Obviously p’(t) is >0. Since ¢ > d and according to (11) d > 6, we have 
p'(d) > p(d) > 0. According to (12), p(6) is positive. Hence for « sufficiently 
small p’(6) is also positive. We have to show that also p’(0) > 0. p’(0) is 
given by 


p’(0) = 1 — p’(6) — p’@) — p’() 


vse" —t i — | 
d'(ds-" — §*-") 6"(ds— -_ 5°") 


1 — p(s) — p(d) — pl) +e E + 


did" —s&") +#a@ — &) —t(d — &) 


= p(0) + . d*6"(d* tae é*-*) 


Since p(0) > 0, « > 0,d > 6 ands > 17, this last expression is positive if ¢ is 


sufficiently large. We may assume ¢ so great that p’(0) > 0, because we want 
to calculate only 


ada = lim a(d, 2). 
i=o 


Now we shall show that 


p'(d) + p(t) > p(d) + pit). 


e-te" | | 
€ d*(ds-* _— 6°") 


Ce ke 
— € ed" —s _ : 


In fact 


p'(d) + p(t) — p(d) — pl) 


I 








GENERALIZATION OF INEQUALITY OF MARKOFF 253 


Then 


p'(0) + p’(6) < p(O) + p(s) = afd, t) 


must follow. Since p’(0) + p’(6) = P(Z’ < d), we have a contradiction and 
therefore the assumption p(t) > 0 is reduced to an absurdity. Hence p(t) 
must be equal to zero and a(d, t) = aa. If we substitute zero for p(t) in (8), 
(9), and (10) we obtain: 


(p(0) + p(s) + p(d) = 1 


(13) p(6)s + p(d)d’ = M, 
| p(5)6° + p(d)d* = M, 
Md" — M, 
4 p(6) = 
(1 ) p( ) d"(c sr §°-*) 
M, — M,i" 
5 pid) = , 
(1 ») . p« ) Th CC sim 6*-") 
We shall prove that p(0) = 0. For this purpose let us make the assumption 


p(0) > 0. Denote by 6; a positive number <6 and let us consider the arith- 
metic random variable Z’ of degree 3 defined as follows: 

Md" — M, 

i” — a) 


M, — M,6;" 
d'(d*" — 6; ") 
p'(0) = 1 — p’(i) — p’(d). 


The r-th moment of Z’ is evidently equal to M, and the s-th moment to M,. 
Since p(6) > 0 according to (12), and p(0) > 0 by hypothesis, p’(0) and p’(é,) 
will be greater than zero if 6; is sufficiently near to 6. The derivative of p’(d) 
with respect to 6; is given by 


p’ (61) = 


p'(d) 


1 — MAs — ra" — ar’) + (8 — Nar "(M,, — Mor) 


d' (a —3,'P 
— s—r—1 
ea ~)bi —. (M, — M,d*’). 
d'(d* Ra 5} _ 


' M, M, alate ; 
Since — > —, the above expression is negative. Hence p’(d) decreases 
( 


ds 


with increasing 6;. Since 6; < 6, we have 


p'(d) > p(d) > 0 


and therefore 











254 A. WALD 


1 — p(d) <1 — pd) =a. 
Since 1 — p’(d) = P(Z’ < d), we have a contradiction and the assumption 
p(0) > 0 is proved an absurdity. Hence p(0) = 0, and p(6) + p(d) = 1. 
From (13), (14) and (15) we have 
M,d* — Md’ + M6’ — M,o° 
5 a « iene ae B. 
q( ) + p( ) d'd"(ds-* cee 6°") 
Hence 
(16) Md’ — Md’ + 6(M, — dad’) + &(d' — M,) = 0. 
The equation (16) in 6 has at most two positive roots, because the derivative 
of the left hand side of (16) 
rs’ '(M, — d’) + si" "‘(d" — M,) 

has exactly one positive root in 6. Since 6 = d is a root of (16), the value of 6 
which we are seeking must be the second positive root of (16), which we shall 
denote by 6p. a 

It can be easily shown that 6 < x/M, < d. In fact, for 6 = 0 the left 
hand side of (16) is positive on account of the assumption (8) and for 6 = ¥/M, 
it becomes equal to 


M.(M, — da’) — M}(M, — a’) = (u, ~ M‘)(M, —d) 


Since M, > M;? and recalling from (3) that M, < ad’, the above expression is 
less than or equal to 0. Hence & lies between 0 and ~/M, < d. 
Hence az is given by the expression 





M, — M,6o" 
17) a=i1- d) ig le ee 
( d p( d'(d* es”) 
For s = 2r the root 49 can be easily calculated. We get 


_ {Maa eM, 
(18) iy = 4/ Me eM 


If we substitute in (17) 2r for s and the right hand side of (18) for 69, then 
we get 


M2, — d'M, 


d’ (u nm Mz, a dM, 
M, — da 


(M, — d’)M2, — M,(Mo, — d'M,) 


Qa 1 — 


I 


= d*(d"(M, — da") — Mo, + M,d'] 

agen d'(M; — Mx) __ 
d"(2M,d" — d** — M2,| 

ai M? — M2, 





~ 2M,d* — d®* — Mo, 








GENERALIZATION OF INEQUALITY OF MARKOFF 255 


Let us denote the non-negative number M2, — M? by a, then we obtain’ 
o 2 2 
19 a=l1- : = M2, — M,). 
- ” (dt — M,)? + o? Ft ae a 
Let us compare the ‘“‘sharp”’ limit given by (19) with the limit given by (2). If 
we substitute, in (2), 2r for s and d for £¥/M, we have 
M2 MY @o@ 
ben i~ Swit ~~ (i «eS 
a d2* ( a ) dz 
as a lower limit of the probability P(—d < X < a < d). We see that for 
small values of o”, ba is considerably smaller than az. 
Our results may be summarized in the following 
THEOREM: Denote by M, the r-th and by M, the s-th absolute moment of a 
random variable X about the point x», where r < s. For any positive value d 
denote by P(—d < X < a < d) the probability that, X — x <d. The “‘sharp” 
lower limit aa of P(—d < X — a < d) is defined as the limes inferior of the 
probabilities P(—d < Y — x < d) formed for all random variables Y for which 


the r-th moment about xo is equal to M, and the s-th moment about xo is equal to 
M,. We have to distinguish two cases. 














M, UM, , mM, .. mM, ..M, 
< 8 case dg = 1 — — < = | 
I — Ss In this case da = 1 7 uf = 3 1, and ag = O7f 7 > 1 
If. > =. In this case ag is given by 
(17) aet~ 2 


d™(d'—* ae 53-7) ? 
where 6p is the positive root #d of the equation® in 6 


Md’ — Md' + 6(M, — da’) + &(d’ — M,) = 0. 


For s = 2r we have 
= 4/ Meee 
o= a 


If we substitute in (17) 2r for s and the above expression for 49 , we obtain 





o 
ie EA ciaerninniiinnn, 
” (dr — M,)? + o? 
where o° = Mo, — M?. 


Cowes Commission, COLORADO SPRINGS. 


7 The case s = 2r has been treated also by Cantelli. He demonstrated the formula (19) 
in quite another way, which cannot be generalized for the case s ~ 2r._ Cantelli’s formula 
and its demonstration are given in the book of M. Frechet, Generalities sur Probabilities. 
Variables Aleatoires, Paris, 1937, pp. 123-126. 

8 As has been shown, there exists exactly one positive root ~ d of the equation con- 
sidered. 






















A MODIFICATION OF BAYES’ PROBLEM 
By R. v. Mises 


The classical Bayes problem can be stated as follows. We consider an urn 
which contains white and black balls (or balls designated by 0 and 1). The 
probability p for drawing a black ball is unknown. But there is given a proba- 
bility function F(x) representing the a priori probability for the inequality 
p <x. We draw n times from the urn (returning each time the extracted ball) 
and get a black ball m times and a white one n — m times. Now, after this 
experiment, we ask for the @ posteriori probability P,(x) for the relation p S z. 

The solution proposed by Bayes can be written in a slightly generalized form: 


(1) P,(x) = K i p'(1 — p)"~"dF(p) 


where K is a constant to be found by means of the condition 
(1’) P,(1) = 1. 
We are interested in the behaviour of P,,(2) if n tends to » under the condition 


; . mM 
(2) lim — = a. 
no n 
. . . . ‘ ‘ i ce ‘ 1 
Laplace found in the ease of a priori equipartition / (x) = x, and I proved in 1919 
for any derivable F(x), that P,(x) tends to a normal distribution: 


(3) lim | Pate — ; / eG in| = () 
n—>20 Vr —2% 


with u = H,(x — A,) 


(4) i aie a _ « — oe) 
217, n 


It is easily seen from (3) and (4) that 
a lim P,(z) = {0 ifz<aea 
(5) im P,(x) = \ «ie ce 


n> 2 . 


Let us now consider a slightly modified form of the problem.” Instead of one 
















1 Mathematische Zeitschrift, vol. 4 (1919) p. 92. 
rechnung und ihre Anwendungen, Wien-Leipzig 1931, p. 158. Later I proved the Laplace- 
Bayes theorem for a more general class of F(x): Monatshefte fiir Mathematik und Physik, 
vol. 43 (1936) pp. 105-128. 

2 This modified problem has been treated by 8. Bochner, Annals of Math., Vol. 37, 1936, 
p. 816. 


Cf. my textbook Wahrscheinlichkeits- 


256 





MODIFICATION OF BAYES’ PROBLEM Zod 


urn we suppose there are given nm urns each containing white and black balls. 
The probability p, for drawing a black ball from the v** urn is unknown, but is 
subject to an a priori probability function F(x) which furnishes the a priori 
probability for the relation p, S x, independently of v. We assume that on 
drawing one ball from every urn a black ball appears m times and a white ball 
n — mtimes. Putting 


) “ee n 
i ptmto tm _, 
n 
we ask for the a posteriori probability P,(x) for the relation p < xz. 


The Bayes formula (1) must now be replaced by 


P,(x) = x’ | | “*e | Pi pz +++ Pm(L — Pmsi)L — Pm) 


(7) Pitpet*** pnasnz 
paxe (i —_ Dn) dF (p;) [ss dF (pn) 


where K’ is a constant determined by (1’). It is very easy to examine the 
asymptotic character of (7). We shall prove the following 
THEOREM: [If the first three moments of the a priori distribution F(x) 


1 
(8) b, = I x dF (zx), v=1,2,3 
0 


exist and if the dispersion b. — bj is different from 0, the a posteriori probability 
P,(x) tends for n — x under the condition (2) to the normal distribution (3) with 





_ _ bs bi — be 
An=a; t+ (1 a) I, 
(9) 2 . ‘ 
l od | E bibs — bs nse (be — b3)(1 = bi) — (bi - | 
2H on b? (1 — by) 


In order to prove the theorem we write 


Py 
V.(p,) saad ; xdF(z), if y= i. 2 SO 7 7) 
i ) 


( 


(10) . 
= = | (1—a2)dF(z7), ifv=m+1,m+2,--+n. 
— Gs J 


Then formula (7) becomes 


(1 1) P,(2) i | see | dVi(p1) dV (pe) cee dV (pn). 
pit pot'**pasnz 
Each V,(p,) is a distribution function, i.e. a non-decreasing function with 


Vi.(— «) = 0, V.(*) = 1. Therefore the constant C in (11) is equal to 1 and 


< 








258 R. V. MISES 








the integral represents the distribution function for the arithmetical mean 
(pi: + po +--+ pr)/n. According to the Central Limit Theorem of the theory 
of probability P,(x) will converge towards a normal distribution when certain 
conditions are satisfied. In every case, if a,, s; denote the mean value and the 
dispersion associated with V,(x), then the mean value A, and the dispersion 
S*, associated with P,(zx) will be defined by 


MN y=] M- y=1 







We find from (10) 


1 1 
a, = | rdV,(x) = ; I x dF (x) = be ify = 1,2,---m 
0 b; 0 










by” 
(13) 
-4,f[ 4-2 fae? ify=m+1,---7 
1—b; Je a 
2 No bs bs 
Sy -{ x’ dV (2) — a; a = = ifv = 1,2,---m 
0 
(14) ——— 


= be — bs — (br a be)” 
1-—b (1-—),)?’ 
We supposed the dispersion of F(a) to be different from zero. It follows that 


(15) b, ¥ 0,1 — b; ¥ O, bsb, — b ¥ O, (be — bs)(1 — bi) — (b, — be)” ¥ 0. 


fy=mt+1,---n. 




















For b; = 0 would imply that dF (x) = Oforallz > Oand 1 — b; = 0 that dF(z) 
= 0 for all x < 1; in both cases the dispersion would be zero. On the other 
hand, it is easily seen that the relation b3b; — b3 = 0 is not compatible with the 
condition of a non-vanishing a priori dispersion and that the same is true for 
the relation (be — bs)(1 — b:) — (b; — be)” = 0. 

The total dispersion ds? is equal to the sum of m times the value (b3b; — b>) /b; 
and n — m times the value [(be — b3)(1 — b:) — (b: — be)*J/(1 — by)”. 

Thus we see that under the condition (2) the sum Xs; tends to «, while the 
ratio s*/Zs; tends to zero, if n increases infinitely. These are sufficient conditions 
for the validity of the Central Limit Theorem.” The values given for A, and 
H®. in (9) follow from (12), (13), (14) and the well known relation 2H’*,S;, = 1. 

S. Bochner in his previously quoted paper found, in a more complicated man- 
ner, the value of A, and only showed that P,(a2) tends to zero if x < A, and to 
igs> A,. 

Examptes. If we assume the a priori probability to be uniform, i.e. F(x) = 2, 
we have 









) = 


— a 
be = 3, bs = } 


Ni 


and therefore from (9) 


3 Cf. H. Cramér, Random Variables and Probability Distributions, Cambridge Tract in 
Mathematics and Mathematical Physics, No. 36, 1937, p. 56. 









MODIFICATION OF BAYES’ PROBLEM 


1 1 


=i Se 
An 3(a + 1), 2H? 187° 


A more general case is that of a more concentrated a priori probability {.. ..- 
tion 


F(z) = C2*(i-2)', C= oti. 
Here we find 
a re by = KAD + 2) 
kU 2? (k+1+ 2)(k +1+4+ 3)’ 
bp = H+ IK + 2K + 3) 
EFTTA Q{HKFIF3YEFTF 4) 


and the values of A, and H? are 


4. tt es! 1 a(l — k) + & + 1) + 2) 


~kh+UT+3’ =H) n(KFT4+ 3K FUF4 - 


By introducing the moments of F(x) relative to the mean value, i.e. 


1 
B= | (2 — b;)?dF = be — bi, 
0 


(16) : 
Bs = I (x = b,)*dF = bs — 3b; be + 2b; 
0 


we can transform the general formulas (9) into 


Ay = bi + bid — bi) (a bi) 


; 1 1 a—b 2bi + a(1 — | 
‘ nym Se... gt 
QI ) 2H*. 1] . + " b,(1 — b:) , bi(1 —_ bi)” 


The first of these equations shows that the a posteriori mean value A, (for 
all n) is equal to the a priori mean value }; , if the experimental mean m/n or a 
coincides with the latter. On the other hand, in the case of a symmetric a priori 
distribution (6b; = 3, B; = 0) the second equation is reduced to 


1 1 2 
oH? = n (Be = 4B2). 
On the whole it is remarkable that the influence of the a priori probability does 
not vanish for n — ©, in the case of our modified Bayes problem.‘ The ex- 
planation of this fact is to be found in a more generalized theory of the inverse 
problems in probability. 


UNIVERSITY OF ISTANBUL, TURKEY. 


‘Ci. my papers quoted in footnote 1 





ON THE PROBABILITY THEORY OF ARBITRARILY LINKED EVENTS 


By Hiutpa GEIRINGER 


1. Introduction. ‘The classical Poisson problem can be stated as follows: 
Let pi, po, --- Pn be the probabilities of n independent events E, , E., --- E, 
respectively; i.e. the probability of the simultaneous occurrence of EH; and E; 
is equal to pip; , that of FE; , E;, EB, is equal to pypjp, and so on. We seek the 
probability P,(x) that 2 of the events shall occur. If, p: = po = --- = py 
the problem is known as the Bernoulli problem. 

More generally the n events may be regarded as dependent. Let pi; be the 
probability of the simultaneous occurrence of E; and E£;; p;; that of E;, E;, Ey 
and finally py..., that of A, , E., --- E,. There shall arise again the problem 
of determining the probability P,(x) that x of the n events will take place.’ 
Furthermore the asymptotic behaviour of P,,(2) for large n can be studied; and 
we shall especially be interested in the problem of the convergence of P,(z) 
towards a normal distribution or a Poisson distribution. 

Even in the general case which we just explained, the sums 


n 7 
Si = ie Pi; Se = - Diz, *** Sn = Pigeon 
i=1 i,j=1 
of our probabilities differ only by constant factors from the factorial moments 
2) 2) . - 
MS”, MO, --- M‘” of P,(x). For we have 


s, =) M® = + Dee-1)--- @—» + DP»). 
Vv: Vig=v 


Starting from this remark the author has, in earlier papers, [8, 9, 10] established 
a theory of the asymptotic behaviour of P,(x2), making use of the theory of 
moments. The criterion for the convergence of P,(x) towards the normal—or 
the Poisson—distribution consists of certain conditions’ which the S, must 
satisfy. 

In the following section a concise statement of the whole problem will be 
given, independently of the author’s earlier publications. For the convergence 
towards the normal distribution we shall be able to establish a theorem under 
wider conditions in a manner which seems to be simpler. Finally, some appli- 
‘ations of the theory will be considered. 


1 See, for instance, references [1]-[7] at end of paper. 

2 Using the ‘“‘theorem of the continuity of moments,’’ Professor v. Mises [11] established 
sufficient conditions for the convergence of P,(z) towards a Poisson distribution in the 
case of the problem of “‘iterations.’’ However, his reasoning can be applied to the general 
case without much difficulty. 


260 











on 


ii 


PROBABILITY THEORY OF ARBITRARILY LINKED EVENTS 261 


2. Formulation of the problem. Let us consider the n-dimensional collective 
(Kollectiv) consisting of a sequence of any v trials. In the simplest case these 
trials will be alternatives, i.e. for every trial there will exist only two results, 
which we may denote by “occurrence,” “non-occurrence”’ or by “1,” “0.” The 
single trial may eventually be composed in various manners. For instance 
we may draw m > n times from an urn, which contains counters, bearing in 
arbitrary proportions numbers from 0 to 9. The first “event”? HE, may consist 
of the fact that the first three extracted counters bear even numbers; the second 
trial E, will be regarded as successful, if the sum of the counters extracted at 
the second, third and fourth drawings is greater than five, etc. In every case 
the result of the n trials will be expressed by n numbers, each of them equal to 


Qorl. The result (1, 1, 0, 0, 0, --- 1), for instance, means that the first, the 
second, and the last trial were successful, the third, fourth, --- unsuccessful, 
and we have an arithmetical probability distribution v(z , 22, --- 2n) (a = 0,1; 


k = 1, 2, --- n), where 


(1) >, +s >, v(21, t2,°°* tn) = 1. 


in 


Instead of the 2" — 1 values of v we will deal with certain groups of partial 
sums of them; the first is 


=: vee > v(x, 2, -+s Beg, 1, Sia «++ Se) = Di (@ = 1,2, --- n) 


where p;is the probability that the 7-th trial will be successful. In an analogous 
manner let p;; be the probability that the 7-th and the j-th trial are both sucess- 
ful, pi; the probability that the 7-th, j-th and k-th trials are simultaneously 
successful. Let us provisionally denote by S"" an (n — 1)-tuple sum over all 
variables, except x; , by S""”’ an (n — 2)-tuple sum over all variables except 2; 
and x; ete. We shall then have: 


= ze ~ O(N, +++ Lin, 1, Liga, +++ Bn) 


y Di = >, ou, Pa 8 Mist Pegs x j vad 
(2) 
Pio...n = VC, 1, --- 1). 
In the following these probabilities p;. pi; , pij; --- will be assumed as directly 


given. There are 


n n n n on 
()*@+-Gn)eQ-# 


values of this kind and it is easily seen, that the partial sums (2) are linearly 
independent. 

If, especially, the probability o(a,, x2, --- x,) depends only on the number 
of zeros amongst x), %,--- %,, 1.¢. if 










HILDA GEIRINGER 






o(1, 0, --+ 0) 
v(1, 1, --- 0) 


v(0, 1, 0, --- 0) 
v(1, 0, 1, --- 0) 


SCOCHCHVCAHHP CCH HCO HHO 


v(0, 0, --- 1) 
v(0, 0, --- 0, 1, 1) 














the value of p; is independent of 7, the 
and so on: 






value of pi; independent of 7 and j, 


















Pi = Po = +++ = Pn 















Pris = P23 = °°* = Pn-i1,n 






In the particular case of independent events we have only to deal with n 
probabilities, namely pi, po, --- Pn. We have indeed pi; = pip; 3; pix = 
PiPjPk *** Prz-..n = Pip2-+- Pn- 

In the case of chains however, we need only know (2n — 1) values, namely 
Pi, P2,-** Pnj Pir, Ps, -++* Pnryn- The other p;;, and the pijx, --- pir...n 
can be expressed in terms of the above probabilities. 

Returning now to the general case it is easily seen that in the expression for 
P,,(x) the p;, pi; --- will appear only in the following combinations 















ial ee 
(3) S,(0) = j, S,(1) = p Pi; S,(2) = _ Pas °°" S,(n) = Pie..-n- 
% to] 


Indeed, at the basis of the solution of the “problem of sums,” there are the 
following relations [11] between the S,(z) and the P,(z). 


n x r= Q, ee Nn 
(4) S,(z) —_ x (:) P,(x) E — 0, ee ") 


The linear equations (4) may be solved (by recurrence) for the P,(x) and we 
find the important result that 


(6) Pa) =D (-"(2) 


Let MS? be the z-th factorial moment of P,,(2), i.e. 


n 


(6) M® = >o2(zx —1) --- (ec —2 + 1)P, (2). 


zz 









Making use of (4) and (6) we obtain 
(7) M? = 2!S,(z). 









Our aim is to obtain information concerning the asymptotic behaviour of P,(x) 
by studying that of the moments of P,(x2). The moments however are easily 
seen to be given in terms of the S,(z). 





3. The asymptotic behavior of P,,(x). Convergence towards the normal dis- 
tribution. 


a. THe Principat THEoremM. According as the mean value 





1e 


THEORY OF ARBITRARILY 




















PROBABILITY 





LINKED EVENTS 





n 


(8) M\” = S,(1) = an = Dd 2P, (2) 


z=1 





remains bounded or not for indefinitely increasing n, there are two types of 
passage toa limit. In the first case the distribution will converge (under certain 
conditions) towards a Poztsson distribution; in the second case it will approach 
(under certain conditions) a normal distribution. As regards the convergence 
towards the Poisson distribution the author has published [9] a sufficient 
condition which seems to be quite simple and general. We shall, however, not 
resume this problem in the present. paper. 

We propose, indeed, to prove in the following pages a new theorem concerning 
the convergence of 


V.(c) = > P,) 


t<z 












towards a normal distribution. 


For this purpose we introduce the following function of the discontinuous 
variable z = 0, 1, 2, --- m 





z+1S,(z + 1) 
(9) gn(z) = 
An S,(z) 
‘ ° zZ + ] S241 Y \: € > 
or, more concisely written g. = _- , Where S,,(z) is defined by (3). Put- 
a 





Wz 


ting z = a,u, let us consider 
(10) 


where wu is regarded as a continuous variable in the interval from 0 toe. (€ > 0.) 
Denoting the variance of x for V,(2) by Mz = s;, we shall prove the 
THEOREM: Let the function h,(u), defined by (10) satisfy the following conditions: 

(i) If n is sufficiently large, h,(u) admits derivatives of every order in the interval 






















gnr(a,u) = h,(u) 





(0, €) 
(ii) At u = 0, the first derivative of h,(u) has a limit, for n — x, which is different 
from —1. 


(iii) If wu is in the interval (0, e) the k-th derivative of h,(u) remains, for every k, 
inferior to a bound N;. which is independent of n. 


Then 


(11) lim V.(an + ys.v/2) = : | e dr 
n—>2 \/ 4 


=> 


We shall see that in many applications these conditions may reasonably be 
assumed as satisfied. 

b. DEMONSTRATION OF THE THEOREM. 

In order to prove the principal theorem, stated above, we shall at first deduce 
some properties of the (finite) differences of g,(z) (2 = 0,1, ---) from the assump- 
tions (i), (ii), (ii) which deal with the derivatives of h,(u). Indeed, the x-th 











264 HILDA GEIRINGER 


difference of g,(z) with respect to z, (which contains the values of g,(z) for 


z= 0,1, --- x), differs only by the factor a’, from the x-th divided difference of 
h,(u), with respect to u (which is formed by the values of h,(u) for u = 0, 
] K . 

5 ees . Let n > «x and so large that «/a, < €; then all u-values used in 
An An 


the formation of the «x-th divided difference of h,(u) will be in the interval (0, e). 
Now, as it is well known, the absolute value of any divided difference of order «x 
can not be larger than the largest derivative in an interval which contains all 
the abscissae, used in the formation of the divided difference. But according 
to hypothesis (iii) the «-th derivatives of h,(u) in (0, €) are all inferior to N,. 
Therefore’ we have 


(12) | an A*gn(z) | < N, 


and for every y > 0 


(13) lim a’, * A‘ g,(z) = 0. 
n—>o z=0 
On the other hand from condition (ii) it follows, as is easily seen, that 
(14) lim a, A gn(z) = anlgn(1) — gn(0)] = ¢ ¥ —1. 
n—> oO z=0 


The equations (13) and (14) imply but finzte differences of g,(z). 

Let us now introduce certain new moments F, which we could call ‘factorial 
moments about the mean.’”’ They are indeed related to the factorial moments 
M”” in exactly the same way as the moments M, about the mean are related 
to the moments M? about the origin. Writing, S., a and g- instead of S,(z), 
a, and g,(z), we set 


F, = A’ (Ma) = M” — »M° a + (5) BE an es ea 
z=0 
(15) 
= viS, —v!S,i1a+ (5) (vy — 2)!S,2a" —---+aq 


where, particularly, 
(16) F.=1, F, =. 


From (15) we have: 


" ° 

(vy) Y ’ v 2 

M "= v!S, = | a a 
z=0 Z 


— F, +. vF',_\a + : F,2a” a oe + v F.a” 2 + a’ 
2 wil 


(17) 


| 


Let us begin by proving the following 


3 If we only want to deduce (13) it is sufficient to suppose that NV, (without being inde- 
pendent of n) increases more slowly than any power of a, . 











= 


Vs —~ A 


PROBABILITY THEORY OF ARBITRARILY LINKED EVENTS 265 


Lemma I: It follows from (13) and (14) that we have for the F, defined by (15) 


_ FF 0 if vodd 
(18 lim — = G, = —— 
(18) n> @* . -3---(v—1)c” if v even. 





First we conclude from (15) and (14) that (18) is true for » = 1 and »v = 2. 
In order to prove (18) for every v, we shall point out, that 


° F, . Ps 
(19) lim + = (v — 1)c lim aie) tee (v = 2,3,---) 
Setting 
Y ' (z 
(20) fe=9:—1 and m, = me M 
a? a? 
we get 
mz 1 
21 qz = a 
(21) g = 
and 


Am, = mf, 


(22) a oe wre 
A’m, = A” (m-f-) 
But according to (15) we have 
(23) A’ m, = - F, 
z=0 a 


and therefore 
(24) — a” x (mefz) —_ a” D Sap A*m, A’ f, —_ D Sas ~ or Fs 
P z=0 wi 
(a+B 2vrv—ljasv—1,8 Sv — 1). 


Here we have made use of the fact that the x-th difference of a product wv can 
be transformed in a finite sum YS,sA°uA*s where a and B are non-negative integers 


anda S «,B Sx. (11 we concern ourselves with derivatives and not with finite 
‘ " K 
differences, we have, a + B = x and Sag = ( )), Suppose 
Qa 
a+Bp>v-—-1. 
rp é . , = ae 1° B B 
Then B = v — a; therefore, as y > a we have B > >= Since A’f, = A’g: 


the product a?” A*f, converges toward zero, in accordance with (13), whereas the 
z=0 


: F ' 
factor Sag —- remains bounded for every a < v. Now suppose 
a ' 








266 HILDA GEIRINGER 


a+B=vr-—- 1. 


v—a 
2 
= 7 
Thus a?” A*f, converges again towards zero, whereas the other factors are 


2-= 


bounded as before. Next, if a = v — 1, then 8B = O and A’f. = fo =90. Thus 


z=0 


Then B = vy —1-—a. First leta < v — 2;then B =v —1—-—a> 


the re term of our sum is equal to zero. Finally if a = »v — 2, 
then 6 = 1, and Sag = v — 1. The corresponding term of the sum “(24) will be 
. ; F,_2 
(v» — 1) lim -lim ad f, = = (v — 1)c lim -,— 
aa E yirv—2) 
no no 2- n->o 


which completes the proof of Lemma I. 

We shall now establish a relation between the factorial moments about the 
mean F, and the ordinary moments about the mean M,. To an expression 
of the form 


(25) ca’ F 


° . . . ° a 
(where the constant ¢ is independent of n) let us attribute a “weight” p + = 


Then we shall prove the following lemma 
Lemma II: Let v = 2u(v even), v = 2u + 1 (v odd) and 


! 


vs 
-™ “e = (v — 2p)! 2°p! 
Then 
(27) M, — >, a,0’F,-2, 


p=0 


is equal to a finite sum of terms of the form (25), each of which has a weight less 
than v/2. 


To prove this lemma we begin by expressing the M, in terms of the factorial 





moments M®. We shall then express the M” by the F.. Now, let s,: be 

the ‘Stirling numbers of second kind,” i.e., putting 

(28) zg” = 2(2 — 1) --- (x: — 2 + 1) 

we have 

(29) «2 ee” (z = 0,1, 2, ---) 
x=0 


Then by an elementary calculation we obtain 


(30) M, = » M”” | — vASp1~-1 + (5) a’ 8,22 meen Oral 
p=0 - 


If we now introduce the F, we get 








PROBABILITY THEORY OF ARBITRARILY LINKED EVENTS 267 


v—1 v—p 
oe 7 Tr oo p . az v _ p is 1 
~ — dX 2 Pent ( r ) 5 (i) r _ 1 ) or 
‘wie ~-@- 2). ee Sas v 
(OM eg Yas # (owe 


Furthermore we may easily verify that 


Cre) Ce) Ce) 


(31) 


(32) 
1/v—p— 2\ @ 1 (v—r—p) 
anne oe + - 
ta 5?) + taro 
But the s,. for z = 0, 1, 2, --- are equal to the values of a polynomial in z, of 


2k 


degree 2x, the highest term of which is equal to 3 . The degree of the product 
K!} 2" 


(33) (’ i el *) Spe = 9(x) 


, 2 


is therefore equal to (v — r — p) + 2p =» —r+ op. On the other hand the 
expression between brackets in the right hand member of (31) is nothing other 
than the v-th difference of ¢(x). (The missing terms of this difference are indeed 
equal to zero, the corresponding s,, being equal to zero.) 


This v-th difference will certainly vanish if 


yv—r+p<vie.r> op. 


Now, let r = p. Then the v-th difference, i.e. the coefficient a, of F,_,,a’ 
= F,_2,a° in (31), is equal to v! multiplied by the coefficient of x” ” in ¢(z): 
1 1 
a 


, = yp! ——___—. _—.. 
(v — 2p)! 2° p! 
Finally, let r < p. Then the weight of F,_,_,a’ is inferior tov/2. We have thus 
established Lemma II. 
We have for instance for vy = 1, 2, 3, 4, 5 


M, = FP, = 0, M:; = F, ~_ a, M; = F; + 3F, +a 
M, = (Fi + 6aF, + 3a) + 6F3 + (7F2 + a) 


M; = (F; + 10 F;a) + (10 Fy + 40 Foa + 10 a”) + 25 Fs + (15 Fe + a) 


I 


Inversely in an analogous manner, we can express F, by the 
M,(p = 1, 2, --- v). 


We can now terminate our demonstration by proving the following 
LemMA III: Jf the conditions (18) are satisfied, then 








268 


HILDA GEIRINGER 





. M, 0 --- vodd 
(34) lim —~ = A, = 
nro M3" 1-3---(v— 1) --- veven. 
First the equation (18) for vy = 2 gives 
. Fe . Me 
ie a tin « 
no ol n—20 a 
thus 
‘ . Me 
(35) lim a l+ec (c ¥ —1). 
It is therefore obviously sufficient to prove the relation 
\ ° M, hy 
(36) lim — = H,(1 +c)”. 
nc a?” 
Putting vy = 2u and v = 2u + 1 respectively we obtain however from our 
lemma 
M, 


u 
i ly 
- Zz a a” Fy», + Ra”. 


a” p=0 

Here R represents a finite sum of terms of the form (25), of “weight” inferior 
to - But by virtue of (18) such a term, divided by a” converges towards zero 
and we obtain 

! 


- M, -- ‘ F inci ~ V. 
lim — = > a, lim — = z. 


neo p=0 nae AYP = (vy — Qp)!2" p! 


(37) 


(r,_2, . 


For an odd v, G,_2, ts equal to zero; for an even v(= 2y, say) however, we have 
G = (Qu — 2p)! 
24-°(u — p)! 
and we obtain 
— p—o (2u — 2p)!2?p! 2+-°(u — p)! 
(Qu)! < u! 
= eS 
2" u! p=0 p!(u — p)! 


(38) 
? — Ho (1 + 0)" 


in accordance with (36). Lemma III is therefore proved. 

Our principal theorem is now an obvious consequence of the well known 
theorem of the continuity of moments. By virtue of this theorem the con- 
vergence of V,(an + ysn+/2) towards a normal distribution as given by (7) 
will indeed be assured if the moments of V, converge towards the moments of 
the corresponding normal distribution; i.e. if (34) is true. 
theorem is completely demonstrated. 


Thus our principal 











PROBABILITY THEORY OF ARBITRARILY LINKED EVENTS 269 


4. Some applications. 

EXAMPLE 1. We shall consider the following play as a very simple appli- 
cation of our theorem: An urn contains m = 2n counters bearing the numbers 
1, 2, --- m. We draw them all, one after the other, without returning the 
counters previously drawn. We ask for the probability P2,(x) that an even counter 
will appear at a drawing of even number x times (0 S x S n). 

As ean be easily found, we have 


Pp. = Pa = +++ = Pn = 2 


_ 12-2 
~ 42n—1 


P24 = Pog = *** = Pen-2,2n 


Consequently 


gs .” g, = Cie —2 is cS — 2)(2n — 4) 
at ie ee = \3] 8 (n — 1)(2n — 2)’ 


(39) 
gs .! (") (2n — 2)(2n — 4) --- (2n — 2z + 2) 

* 2 \z/] (2n — 1)(2n — 2) +--+ (Qn —24+ 1)" 
From (39) it follows that 


n— 22n — 2z 
(40) gr(z) = - = - 
n 2n—< 
Setting z/3n = u, we get 
(2 — u)’ 
hy, = 
(41) vate) 


u ' 


The conditions (i), (ii), (iii) of our principal theorem are obviously satisfied if 
e < 4 and we have 


/ 
3 
h,,(0) => —-4=-C 
The probability defined above is thus seen to converge (according to (11)) towards a 
bail n n 
normal distribution, having a mean equal to 5 and a variance Mz, ~ 3 


EXAMPLE 2. Probability of an ‘“occupation.”’ Let k stones be distributed 
by chance over n places. Then the probability that any stone will occupy a 
certain place will be equal to 1/n. We ask for the probability P,(x) that there 
shall be x places, every one of which is occupied by exactly m stones.* 

By certain simple considerations, well known in combinatory calculus, 
we obtain: 


k! 1 m l k—m 
2 = — |- i-- 
ae " ° m\(k — m)! (*) ( ‘) 


‘ The problem presents itself for instance if we ask for the probability that in a certain 
county there will be z villages, everyone of m inhabitants. 





270 HILDA GEIRINGER 
n! k! 1 
(43) “* z\(n — z)! (m!)*(k — mz)! Z 
Let k/n = a. From (43) we deduce that 
ae Zz + 1 


mn—Z 
gn(z) = 


(44) 
_ 2m + m— ') 


n 


_ m = § 
n 


Now, let n and k tend simultaneously to ~, in such a way that a = — remains 


bounded. We get at first 


m 
. i 7 
(45) ia > na. 


neo 7 m! 


As a, is seen to be of the order of magnitude of n we introduce the new variables 


a 
“=p and v=u—. 
n n 


We have then (writing h and h instead of h, and h,): 
9n(Z) = gn(nv) = h(v) 


h(v) =h (u on) = h(u). 


Therefore 


1— 


- l on | n 
h(v) = (1 — v) ¢ — ) 
n(1 — v) (1 a) i enables 1 


n n 


1 m—1 
nmv (a2 — mv)(a—- —m)---la- — mv 
1 l n n 
n(1 — v) 1 m— | 
ie- (naw 
n n 


These formulae show that the k-th derivative of h(v) with respect to v contains 

only rational expressions, [in the denominators of which there appear powers of 

] os a ; - 

- . The conditions (i) and (iii) 
n(1 — v) 

of our principal theorem are therefore satisfied if « < 1. Furthermore we have 


(46) 


(1 — v)], and positive powers of log (1 








PROBABILITY THEORY OF ARBITRARILY LINKED EVENTS 271 


dh a . ] m 
(S) | = —-l— 4 — mn log (1 _ a) + i 


] [= 
n n 


1 2 m— | 2 

(47) (« 7 _ (« vs =) = (« om ) + a(a - 5) 
(a-"= ') + a(a- ty. (a-"=7) 
_ n n n 


and consequently 


lim (3) Pa | -1 ‘i ee a+ 2m | lim = 
no du u=0 a no 


- -( 4 (m — a\¢ a wee, 
a m! 


We have thus obtained the interesting result that, 
The probability V,(x) that x places at most are occupied, each one by m stones, 
converges towards a normal distribution if k and n tend simultaneously to ~ in 


_ & ’ 
such a way that lim — = ais bounded. We have then 


n—->oO hi 
(48) lim V,(a, + uv/2 sn) = o(u) 
with 
m 2 m —a 2 
a ”" le m— 
(49) ia «a 6", i oe ie Bee 1+ @) ; 
oon wt m! n~e Ga m! a 


UNIVERSITY OF ISTANBUL, 
ISTANBUL, TURKEY. 


REFERENCES 


[1] H. Portncart, Calcul des Probabilités, Paris, 1912 

[2] W. BurnsipgE, Theory of Probability, Cambridge, 1928 

[3] G. U. Yun, /ntroduction to the Theory of Statistics, London, 1932 
[4] C. Jorpan, Acta Litter, Scient. (Szégéd), Vol. III (1927), p. 193 
[5] C. Jorpan, Acta Litter. Scient. (Szégéd), Vol. VII (1934), p. 103 
6] E. J. GumBEL, Comptes Rendus, Vol. 202, (1936), p. 1627 

[7] E. J. GumpBex, Giorn. Inst. Ital. Att., Vol. XVI (1938) 

[8] H. Gerrincer, Comples Rendus, Vol. 204 (1937), p. 1856 

[9] H. GerrIncER, Comptes Rendus, Vol. 204 (1937), p. 1914 

(10) H. Gerrincer, Revue Interbalconique, (Athens), Vol. II, pp. 1-26 
[11] R. von Misgs, Zs. Aug. Math. und Mech., Vol. I (1921) 














FIDUCIAL DISTRIBUTIONS IN FIDUCIAL INFERENCE* 
By 8. 8S. WiLks 


1. Introduction. The essential idea involved in the method of argument 
now known as fiducial argument, at least in a very special case, seems to have 
been introduced into statistical literature by E. B. Wilson [1] in connection with 
the problem of inferring, from an observed relative frequency in a large sample, 
the true proportion or probability p associated with a given attribute. Since 
1930 the ideas and terminology surrounding the fiducial method have been 
developed by R. A. Fisher [2, 3], J. Neyman [4, 5] and others into a system 
for making inferences from a sample of observations about the values of param- 
eters which characterize the distribution of the hypothetical population 
from which the sample is assumed to have been drawn. The functional form 
of the population distribution law is assumed to be known The parame- 
ters may be means, a difference between means, variances, ranges, regression 
coefficients, probabilities or any other descriptive indices or combinations of 
indices which may be considered important in specifying the distribution 
function of a population. In arguing fiducially about the value of a parameter, 
a procedure applicable to some of the simple cases begins by the calculation 
from the sample of an estimate of the parameter in question. The values of 
the estimate in repeated samples of the same size will theoretically cluster 
‘near’ the true value of the parameter according to a certain distribution law 
which ean, in general, be deduced from the functional form of the population 
distribution law. If the distribution of the estimate involves only the one 
parameter, and if, as is frequently the case, one can find a function y of the 
estimate and the parameter which has a distribution not depending on the 
parameter, then one is able to set up, in a rather simple manner, fiducial limits 
or a confidence interval for the parameter corresponding to the observed value 
of the estimate. The limits will depend on the particular method of calculating 
the estimate, the value of the estimate in the sample, and on the degree of risk 
of being wrong which one is willing to take in stating that the limits will include 
between them the value of the parameter for the population under consideration. 
In general the smaller the degree of risk, the wider apart will be the limits. 
Thus for a given pair of limits there will be an associated degree of uncertainty 
that the true value of the parameter is actually included between those limits. 
This uncertainty can be expressed by a probability a caleulated from the 
sampling distribution of the y function of the parameter and estimate. Under 
certain conditions, one can, by simply changing variables, obtain from the ~ 


* An expository paper presented to the American Statistical Association on December 
28, 1937, at the invitation of the Program Committee. 


272 





FIDUCIAL DISTRIBUTIONS IN FIDUCIAL INFERENCE 273 


distribution what has been termed by Fisher a fiducial distribution function of 
the parameter. From the fiducial distribution and for a given value of the 
estimate one can actually determine fiducial limits of the parameter corre- 
sponding to a given risk a. It will be seen as we proceed that the fiducial 
distribution plays no indispensable part in fiducial inference; the y function 
and its distribution from which the fiducial distribution is derivable, are suffi- 
cient for the fiducial argument in many cases that commonly arise in statistics. 
We shall discuss fiducial argument and fiducial distributions from the point of 
view of y functions. 


2. Example. To illustrate these points let us consider an example, namely, 
the problem of determining fiducial limits and the fiducial distribution of the 
range of a rectangular distribution for a given value of the range in a sample 
“randomly drawn”’ from it. 

If a sample of n individuals is drawn from a population whose distribution 

law is f(z, 6) = 1/6, where only values of x between 0 and @ are considered, 
(that is, a rectangular distribution having range @) the probability that the 
range r of the sample lies between r and r + dr is g(r, 6) dr, where 
(1) g(r, 0) = == 1) (9 —r)r””. 
Here @ is the parameter under question, and r is the estimate; r is the difference 
between the largest and smallest variate in the sample. Thus, for a given 
value of 6, say @, g(r, %) is a sampling distribution law defined for given 
values of r on the range r = 0, tor = 6. If we let r/@ = y, then 


(2) g(r, 0) dr = n(n — 1)(1 — Wy" dy = GW) dy, 


which, from a statistical point of view, shows that if we should take an aggre- 
gate of randomly drawn samples (of n items each) from rectangular populations 
and calculate y for each sample-population combination, then the distribution 
of » will be that given in (2). By a sample-population combination in this 
example we mean any rectangular population that may arise and a “‘randomly 
drawn” sample from it. The possible values of y range from 0 to 1. Thus if 
Yo Is such that 


(3) n(n —1) [ (l1—y)y" "dy =a, ie. pr "[n —(n— Dwal = a, 


and if we draw a sample of n from a rectangular population, we can claim that 
the probability is 1 — a@ that the y produced by this sample-population com- 
bination will satisfy the inequality 


(4) Ya< yl. 


. , , 
It should be observed that there are many pairs of numbers, say ~a and Wa 
such that we can claim that ¥, < y < W., with probability 1 — a of being 




















274 Ss. S. WILKS 





. . . / ? ° . 

correct in making the claim. y, and y, are ordinarily chosen so that the 
interval formed by them is as short as possible (or approximately so) in some 
sense. Inequality (4) is equivalent to each of the following inequalities 


5 owes r ‘ 
(5) Ya<, <i, tad 





Now y. can be determined from (3) when n and a are given. For example, 
if a = .01 and n = 10, we find from (3) that y. = .495. For a given sample, 
the fiducial limits r/y. and r can be calculated from ¥. and the sample. It 
will be noticed that fiducial limits are nothing more nor less than random 
variables that fluctuate from sample to sample. The interval between r and 
r/o is called a confidence interval or fiducial interval; 1 — a@ is known as the 
confidence coefficient [4] associated with the limits. Hence, in repeated samples 
of n from a rectangular population with range 4 , 100(1 — a) percent of the 
samples will produce fiducial limits r/y. and 7 which include the fixed value 6 
between them. This statement holds regardless of the value of 6. Hence 
in an aggregate of sample-population combinations, the aggregate of pairs of 
fiducial limits r/y,. and r will, in 100(1 — a) percent of the combinations, in- 
clude between them the true value of the range of the population. Further- 
more, whether there is only one rectangular population for all sample-popula- 
tion combinations or many different rectangular populations, this statement 
remains true, thus showing that the method of fiducial limits for inferring the 
value of the parameter is independent of any a priori distribution of rectangular 
populations in an aggregate of sample-population combinations—the distribu- 
tion being with respect to values of @. 

Let us look at the matter geometrically. Suppose we are drawing samples 
from a rectangular population with 6 = 6. The r for each sample is repre- 
sented by a dot along Or in Figure 1; corresponding to each dot there is confi- 
dence interval cutting across the V-shaped region MOR. The probability is 
1 — a that a confidence interval computed from a sample from the population 
having range 4 will cut the line @K. The cutting of @K by a confidence 
interval is equivalent to the statement that % is included between the corre- 
sponding fiducial limits. 

From a practical statistical point of view what we have said has the following 
meaning: If on each occasion in which a randomly drawn sample of n from 
some rectangular population is considered, one (i) calculates the numbers r/Pa 
and r, and (ii) asserts that the range in the population producing the sample 
lies between these two computed limits, then in about 100(1 — a@) percent of 
the cases assertion (ii) will be correct (theoretically). Thus, in dealing with 
samples of 10 individuals from rectangular populations, one would be correct 
(theoretically) in about 99 percent of the cases by asserting that the population 
“05 times the sample 
range. More generally, one need not use the same value of n all the way 


range will lie between the sample range and 2.020 (- 








FIDUCIAL DISTRIBUTIONS IN FIDUCIAL INFERENCE 275 
through, provided that for the given a one evaluates y. according to (3), for 
each n that arises. It will be seen from (3) that as n increases, the value 
of ¥. tends to 1 and hence the fiducial limits r/y. and r for any given sample 
tend to the same value, namely the sample range, thus showing that fiducial 
inferences about @ can be made arbitrarily certain by taking sufficiently large 
samples. 

It is evident that the method of fiducial limits furnishes a satisfactory pro- 
cedure for inferring the value of the population range @ from samples drawn 
from rectangular populations. Let us now go a step further and consider the 


fiducial distribution of @ and how it fits into the scene. The cumulative distri- 
bution of y is 


(6) y"*[n — (n — 1)¥) 


Zz 


ge 
Fic. 1 


and hence the cumulative distribution of r for a fixed 6, say 4, is 


(7) F(r, 6) = (Fy | ~@=1) (*)| 


which increases from 0 to 1 as 7 increases from 0 to 4. Geometrically, z = 
F(r, 9) can be represented as a surface defined over the region bounded by lines 
O0¢ and OR in Figure 1, such that z is zero along O and is unity along the line 
OR (r = 6). F(r, @) is continuous inside the region 6OR, and for any given 
value ro + 0 of r, F(r, 6) decreases from 1 to 0 as 6 increases from 7 to ~. 
The curves having the equations 


= F(r, 6) (2 = F(r, @) 
and 
= Ay \r = 1o 











276 Ss. S. WILKS 


(where 6, 70, and @ are such that ro/@ = Wa and F(7%, %) = @) are the 
curves C and D respectively. C is the cumulative distribution of ranges of 
samples of n from a rectangular population with range 6). The curve D has 
the mathematical characteristics of a cumulative distribution function cumu- 
lated in the negative direction with respect to 6: its ordinates increase from 


~ 


_ a e x 
0 to 1 as @ decreases from « to 6). Thus, if we take — F (ro, 6) we get : 


function g(@, ro) which has the essential mathematical characteristics of a 
distribution function: it is non-negative, can be integrated over any interval 
of 6, and has total area under it equal to unity. We have 


a . 

(8) g(9, 7.) = n(n — 1) + (: _ =) 
and it is called the fiducial distribution of 6 for r = ro. It must be firmly 
pointed out that @ is not a random variable and hence g(@, 7) is not a distribu- 
tion function of a random variable, although it has the mathematical properties 
of such a distribution. Objections have been raised to the use of the term 
fiducial distribution on the grounds that the thing to which it applies is not a 
distribution at all. However, as long as the term is carefully defined there 
should be no ambiguity in using it. From an analytical point of view, the 
problem of obtaining the fiducial distribution of 4 is only a matter of changing 
variables for since 


(9) g(r, 0) dr = g(6, r) dd = n(n — 1)(1 — Wy" dy 


and fa = 7/Wo, we have 





90 rolvo 1 
(10) | g(r, ) dr = | g(0, 70) dé = | n(n — 1) — py" “dy = 1—a. 
Gove r Va 


0 


We remark again that 


61 
(11) I g(@, ro) dé 

TO 
is not to be interpreted as probability as though 6 were a random variable. 
Instead, the meaning is as follows: Let 7 be the range in a sample known to 
be from some rectangular population, and let the value of 7 be inserted in 
(11), and let 6, be determined so that the value of the integral is 1 — a. The 
two limits for the integral are fiducial limits associated with the sample for the 
confidence coefficient 1 — a, which were discussed earlier. Thus, for each 
sample, we can compute fiducial limits using the fiducial distribution. These 
limits, as we have seen by considering the yw function, fluctuate from sample 
to sample in such a way that the probability is 1 — a@ that they will include 
between them the true value of the range of the population under consideration. 


3. Summary of Principles. From the point of view we have taken the 
essential notions involved in the method of fiducial argument and _ fiducial 





\w 


Kw 


e€ 


1 





FIDUCIAL DISTRIBUTIONS IN FIDUCIAL INFERENCE 277 


distributions for the case of a continuous variate and one parameter can be 
readily abstracted from the example just discussed. "n general, we have the 
following steps: 

(a) A sample is assumed to be randomly drawn from a population with a 
distribution of known functional form f(z, 6), 6 being a parameter. Let 
1, X2,-+- , Xn be the values of x in the sample. 

(b) A function, say ¥(a, ---,2,, 9) of the sample 2’s and @ is devised so 
that its sampling distribution G(y) involves @ and the 2’s only as they 
enter into y. The value of @ in y is that for the population from which 
the sample is actually drawn. 

(c) Two numerical values of y, say ¥, and W, are chosen (ordinarily as close 
together as possible) so that the probability computed from G(y) is 
1 — a (eg. 0.95) that y will lie between yi, and y—more briefly 
Pa <¥ <a) = 1-2. 

(d) The inequality ¥, < y < WZ which contains only one unknown, namely 8, 
is solved for 6 giving the equivalent inequality 6 < 6 < @ where 6 and 6 
are fiducial limits and are subject to sampling fluctuations. 

(e) The expression P(y, < y < W,) = 1 — ais replaced by the equivalent 
expression P(@ < 6 < 6) = 1 — a@ which states that the probability is 
1 — a that a sample will yield values 6 and 6 which will include the true 
value of 6 between them. 


(f) The differential element for the fiducial distribution of 6 is G() x dé 
(provided dy/06 is a function of @ which does not change sign for a given 
sample of 2’s) and is obtained by letting @ be the variable in G(p) dy, 
keeping the 2’s fixed. 

To give precisely the conditions under which all of these steps can be per- 
formed is a technical matter which will not be considered here. It is suffi- 
cient to remark that they can be performed in many cases of practical interest. 
Fiducial argument can be carried on using only the first five steps without 
introducing the notion of a fiducial distribution. In connection with step (a) 
it should be particularly noticed that the functional form f(x, @) of the popu- 
lation under question is assumed to be known and that the sample under 
consideration is “randomly drawn”’ from the population. Thus, in applying 
the theory to practical problems it is a matter of fundamental importance 
that these two assumptions be valid. In cases where a sufficient amount of 
data exists, it can usually be satisfactorily tested by using the x° test and other 
devices, whether or not a given functional form for f(z, @) is a valid assumption. 
In cases where sufficient data do not exist for actually making such a test 
justification for assuming a given function form usually has to be made on the 
basis of theoretical considerations. From a practical point of view the notion 
of randomness is characterized by methods of drawing samples rather than 
a posteriori mathematical considerations of the sample after it has been drawn, 
and thus the question of randomly drawing samples depends largely upon the 





278 S. S. WILKS 


experience and sound judgment of the experimenter. However, after one or 
more samples have been drawn ‘at random,” the problem of arguing from 
them about the populations from which they were drawn is largely mathe- 
matical. 


4. Case of large samples. For a population with a distribution of known 
functional form, a fiducial distribution of the parameter clearly depends on the 
size of the sample and the particular estimate used. For example, in large 

samples, we would get a fiducial distribution of the mean of a normal popula- 
tion of known variance by using the sample mean which would be different 
from the one obtained using the median of the sample. In order to be able to 
make the inferences about @ as accurate as possible, a y function should theo- 
retically be used which will produce fiducial limits which are closest together, 
on the average, or perhaps ‘“‘best’”’ in some other sense, for a given a. The 
fiducial distribution obtainable from such a y could then be referred to as the 
“best” fiducial distribution, and theoretically it should be used in preference 
to other possible fiducial distributions if fiducial distributions are to be used 
at ali to set fiducial limits. In large samples from a population with a distri- 
bution function f(z, @), it is known [6] that, under rather general conditions, 
fiducial limits which are closest together on the average can be obtained by 
letting 


ae a Gra) 


and treating y as a normally distributed variate with zero mean and unit 
variance, where L = . log f(a; , 6), the logarithm of the likelihood of 6 for 
i=1 

the given sample, 2, %2, --- , 2, are values of x in the sample, and EF denotes 
mathematical expectation. For example, in the case of a binomial population 
where each individual belongs either to class A or class B, we have f(z, 6) = 
6°(1 — 6)'* where @ is the probability associated with class A, x will be 0 or 1 
according to whether an individual belongs to B or A. In a sample of n indi- 
viduals, L = m log 6 + (n — m) ) log (i — . where m is the number of individ- 

‘ _ {fa log f(z, AY m — né 
uals in class A. E g I = , and we get y = 

a0 J = ai — 6) VV no(1 — : 
If we should want to find fiducial limits of 6 for a ec ifidence coefficient of .9! 
m — né 


we would solve (1) the equations /nb(1 — 8) 
values of 6, say @ and 6. We can then say that @ and @ will include the true 
value of 6 between them with a probability of .95 of being correct, in the sense 
that if we applied this rule consistently to samples from binomial populations, 
we would have a procedure that would lead to a correct statement in about 95 
percent of the cases (theoretically). 

To illustrate the difference between the fiducial method and the commonly 


+1.96 for 6, thus getting two 





FIDUCIAL DISTRIBUTIONS IN FIDUCIAL INFERENCE 279 


used method of placing limits on @ for P = .95, consider an example in 
which m = 150, n = 400. The usual procedure is to replace @ by m/n in 
6+ 1.96 4/9 , Which yields .311 and .431. The fiducial procedure is 

m—n ; sa = 
to solve the equation /no(1 — 8) = +1.96, for 6, thus obtaining .312 and .455. 
For the case of small samples, the problem of getting ‘“‘best’’ fiducial limits be- 
comes more complicated [5]. 

5. Extensions of Fiducial Argument. It will be observed that it is not 
necessary for y to be a function of only one statistic and @ in order to be able 
to argue fiducially about 6. For example, if a sample of n is drawn from a 
normal population with mean @, it is well known that if @ is the sample mean 
then 


el ( — 0)- V/n(n — 1) 
13) ~ iy i 
igen 
1 
(which is Fisher’s ¢ function), has the distribution 


I'(3n) dy 
Oe r(3(n — 1)) Vx(n — 1) (i+ ¥/(n — 1)]" 


n 
Here y is a function of two statistics, namely 7 and > (i — 2)°, and the fiducial 


i=1 
distribution of @ for this y function is obtained at once by applying rule (f). 
The ideas of fiducial argument may be extended in other directions, but 
these cannot be considered in any detail here. For example y may be a func- 
tion of 2, , --- , 2, and two or more population parameters, in which case one 
could set up fiducial regions for the several parameters. From a practical 
point of view, the fiducial argument for two or more parameters simultaneously, 
had hardly been touched. Again y may be a function of statistics from two 
samples, one observed and the other not yet observed, and not involving popu- 
lation parameters, at all, in which case one can argue fiducially about the 
statistic in question for the unobserved sample [3]. The notion of a fiducial 
distribution has been extended to several parameters taken simultaneously 
(3, 7], but the problem of working out relations between fiducial distributions 
of several parameters and fiducial regions is yet to be investigated. The 
principles may be readily applied in situations in which the 2’s involved in y 
take on discrete values. In this case the equality signs in the probability ex- 
pressions in steps (ce) and (d) would be replaced by greater than or equal signs 
(>). Two excellent examples of the application of principles of fiducial argu- 
ment to the discrete case are furnished: (i) by a paper by Pearson and Clopper 
[8] on fiducial limits of the probability P from samples from a binomial popula- 
tion, and (ii) by a paper by Ricker [9] on fiducial limits of m in the Poisson 
distribution f(z, m) = me ”/z!. 





S. S. WILKS 


REFERENCES 


[1] E.B. Witson, “‘Probable Inference, the Law of Succession, and Statistical Inference,” 
Jour. American Statistical Association, Vol. 22 (1927), pp. 209-212. 

[2) R.A. Fisuer, “The Conceptsof Inverse Probability and Fiducial Probability Referring 
to Unknown Parameters,’’ Proc. Royal Society of London, Series A, Vol. 139 
(1933), pp. 343-348. 

[3] R. A. Fisuer, ‘‘The Fiducial Argument in Statistical Inferences,’’ Annals of Eugenics, 
vol. 6 (1935), pp. 391-398. 

[4] J. NeyMan, ‘On the Two Different Aspects of the Representative Method: the Method 
of Stratified Sampling and the Method of Purposive Selection,’’ Royal Statistical 
Society, vol. 97, 1934, pp. 558-625. 

[5] J. Neyman, “Outline of a Theory of Statistical Estimation Based on the Classical The- 
ory of Probability,’”’ Phil. Trans. Roy. Soc. London, Series A, Vol. 236 (1937), 
pp. 333-380. 

[6] S. S. Winks, “‘Shortest Average Confidence Intervals from Large Samples,’’ Annals of 
Mathematical Statistics, Vol. 9 (1938), pp. 166-175. 

[7] I. E. Seaeat, “‘Fiducial Distribution of Several Parameters with Application to a Normal 
System,”’ Proc. Cambridge Phil. Soc., vol. 34 (1938), pp. 41-47. 

[8] C. J. CLoprper anp E.S. Pearson, ‘‘The Use of Confidence or Fiducial Limits in the 
Case of the Binomial,’’ Biometrika, vol. 24, 1934, pp. 404-413. 

[9] Witu1am E. Ricker, ‘‘Fiducial Limits of the Poisson Frequency Distribution,’’ Jour. 
American Statistical Association, vol. 32, 1937, pp. 349-356. 


PRINCETON UNIVERSITY. 








p 


BIOLOGICAL APPLICATIONS OF NORMAL RANGE AND ASSOCIATED 
SIGNIFICANCE TESTS IN IGNORANCE OF ORIGINAL 
DISTRIBUTION FORMS* 


By Wiiu1amM R. THompson 


The word normal has been used in many senses—commonly by statisticians 
to designate a well-known distribution function. Another use familiar to bi- 
ologists, particularly in experimental work and medicine, is to denote an 
untreated or control part of a universe, or a part whose members are free from 
specified characteristics such as evidence of past or presentedisease or malforma- 
tion. Closely related to this last usage are attempts to delimit so-called normal 
ranges of variation for a quantitative attribute of the members of part or all 
of a universe in question. Interpretations are often vague, as when the interval 
between the least and greatest values observed in either a large or a small 
number of instances is taken to estimate a normal range. We shall consider 
the problem of using ranked data for estimating normal ranges as defined in 
the next paragraph. 

If the instances have been drawn at random from a universe (l°) of all 
possible observations obtainable in a prescribed manner, and are enumerated 
in ascending order of magnitude, {2;} for? = 1, --- , n; then it is proposed to 
show in the present communication how ranges of the type (a , %n+1-~) May 
be used to estimate normal ranges, R; , where the subscript f is the theoretical 
probability that a random value, x, drawn from U’ will lie within the range R, , 
g that it will lie above, and g that it will lie below (where 2g = 1 — f). Further- 
more, it is proposed to show how these ranges may be used as the basis of 
significance tests where altered conditions appear to lead to abnormal biological 
variation. The form of frequency-distribution of UL’ is supposed unknown, 
and is without effect upon the analysis. Section 1 is a development of the 
theory of range estimation, treated briefly in a previous paper [1] together with 
illustrations of its application. Section 2 deals with significance tests. 

1. The Method of Range Estimation. Let x be a real variate, a random 
value drawn from an infinite universe or population U. Let f(a) be the fre- 


2) 


f(z) dx = 1. Then 


2 


quency function of z in U, supposed unknown; and | 


for any given a and 8, where a < 8, and 


(1) Pla<z<p= | f(a) dz. 


J@ 


* Presented at a meeting of the American Statistical Association, December 28, 1937, 
Atlantie City, N. J. 





WILLIAM R. THOMPSON 


To facilitate development, suppose that in any finite sampling under considera- 
tion no two values of x may be exactly the same. Let S = {a},k = 1,---,n, 
denote a random sample from U’, where the order of enumeration is arbitrary, 
but temporarily taken as a random order (to fix the ideas, consider this the 
order obtained in drawing). Let p, be defined by 


. 


Tk 


(2) pr = Pix <x) = | f(x) dx fromwhich dp; = f(xx) dx... 


J-« 
Then p, is the probability that a random x from U shall be less than any number 
x; . Then obviously if x, is drawn at random from U, p; is a random variable 
whose distribution is the unit rectangle; i.e., P(p’ < pp < p”) = p” — p’. 
Furthermore, the joint probability that x, will lie in the interval x, , x, + dz, 
and that exactly 7 values in the sample S will be less than 2; is, to within terms 


, : 1 r n—1—r 
of order dpi: ; \ a )-pi-( = Pr) : dpi; , 
: 
Then, in repeated sampling as above, for the case where just r of the n random 
ralues {x;:} are less than the k-th drawn, let P,,.(p’ < p. < p’’) denote the 
probability that p,; lies in the interval (p’, p’’). Then 


(r+ s+ a2 


, 


(3) Pi(p’ < pe <p”) = p'-q°-dp, 


r!-s! 


, 


where s = n — 1 — r, andqg = 1 — p. Obviously, the expression on the right 
of (3) does not depend on k if this index is the order of draft or a random index, 
but only upon the condition that exactly r of the n random values from LU’ be 
less than a value 2, drawn at random from the sample of n values. Accord- 
ingly, we obtain the same result if we enumerate the n values {2;} in ascending 
order of magnitude (2; < 2;, if 7 <j). Then k = r + 1, in the cases con- 
sidered, and (3) may be written, 


, ” n! . k-1 n—k 
i , = ° ‘ ‘ 
(4) Pap! < pe < BY) = 1)"(n — k): I _— 


for0 < p’ S p” <1. Obviously, the result is the same if we deal instead with 
the k-th value (x;) of every random sample S drawn. In passing it may be 
noted that for p’ = 0 and p” = p in (4) we have 


(5) P,(pr < p) = I,(k,n —k + 1), 


which may be evaluated for k,n — k + 1 < 50 by means of the Tables of the 
Incomplete Beta-Function [2]. 

Of course, P,(0 < p, < 1) = 1, and (4) gives p, , the mean value of p, in 
repeated random sampling of n values from LU’, as 


' Pa 
2 =~ nN. k n 
6) P= (k—1)im — &)! I " 











BIOLOGICAL APPLICATIONS OF NORMAL RANGE 


—— ‘ 2 . * 
Similarly, the variance, o;,,, of p: is given by 


2 a. k(n ie’ k + 1) 

7 o>, = El(pe — px) | = 
(7) oe = El(pe — Pr) (n + 1)?-(n + 2) 

Now suppose that we want to find a range (a, 8) such that, in random drafts 

from U, the theoretical relative frequency of drawing zx less than a is g, and 

the same as that of drawing x greater than B. (a, 8) may be called a central 


confidence range with a confidence f = 1 — 29 that x drawn at random from U 
will lie within the range. For g = k/(n + 1) we may take the range Ry; = 
(a, Yn-k41); and likewise with g = 5% we may estimate, or approximate 


by interpolation where 20k > n + 1 > 20(k — 1), a range R; for normal bio- 
logical variation of a specified character, and this may be called briefly the 
estimated 90% central normal range. 


Of course the probability of drawing x < ais | f(x) dx, and that of drawing 


"3 


r> Bis | f(x) dx; and these probabilities are unknown, as the frequency 
8 


function f(x) is unknown; but with a = x and B = 2n_x41 the theoretical 
relative frequency in each case is k/(m + 1) regardless of the universe. 

It has been shown [1] also that if the sample S were drawn at random from 
a finite ordered population of aggregate number N, denoted by Uy, and Np, 
is the number of values in Uy that are less than the k-th member of the given 
random sample in ascending order of magnitude; then, for S a sample of n 
values as before, the mean value of p; in repeated sampling is 


‘ k 1 1 
i - - +4 (1 + x) — yy and 


2  kkKn-k+1) x) ( -*) 
coe = Gee ene 3) (+ IH 


An example is furnished by an analysis of data reported by Wadsworth and 
Hyman [3] in a study of influences of antigenic treatment of horses upon their 
plasma concentration of esterified cholesterol, free cholesterol, and phospho- 
lipids. As in chart 1 for normal horses, a graph has been constructed for each 
horse studied, using time as abscissa and a logarithmic ordinate scale for ob- 
served values of plasma concentration of the constituents: 

1. Esterified Cholesterol, 

2. Free Cholesterol, and 

3. Phospholipids times one-tenth, 
the respective successive points for each being joined to form three polygon 
curves. As these are in all cases discrete and lie in the order of enumeration 
from top to bottom of the graph, no special label seemed needed; but estimated 
normal ranges for the central 90% of variation have been indicated in each 









284 WILLIAM R. THOMPSON 


sxase by two horizontal lines between brackets at the right, numbered to corre- 
spond with the enumeration above. The ranges are based on observations on 
62 plasma samples, each from a different presumably normal horse. The normal 
horses in the chart show about the same individual variations; but, of course, 
the ranges are not to be interpreted to indicate normal variation for an indi- 
vidual animal. 

Chart 2 presents in like manner the data obtained for horses under immuniza- 
tion against tetanus and the streptococcus. The tetanus immunization treat- 


DAYS DAYS DAYS 
10 20 = 30 10 20 10 20 ~=630 


NORMAL HORSES 


CuHart 1. On each graph for a given normal horse, the number of which appears below, 
the curves in descending order respectively represent (1) esterified cholesterol, (2) free 
cholesterol, and (3) one-tenth phospholipid concentration in plasma (in mg. per 100 ec.). 
Corresponding 90-per-cent normal range estimates are indicated. 


ment appears to produce marked and sustained depression in all three curves 
of at least five of the six animals observed. 

That this is statistically significant seems obvious. <A single observation 
below the 90% normal range should be expected once in twenty random trials 
if normal causes of variation may be assumed unaffected by the treatment in 
question. The expectation of obtaining 5 or more such values in six independent 
trials is obviously much less, and may be accurately estimated by means of rela- 
tions developed in the following section. 





BIOLOGICAL APPLICATIONS OF NORMAL RANGE 285 


2. Significance Tests. Now consider as in section 1 another sample S’ of n’ 

, / Sep . 
values; {a}, k’ = 1, --- ,n’ (where x; < x; if i <j), drawn at random from an 
infinite universe U’’ as was S from U; but where U’ and UU are not necessarily 


MONTHS MONTHS 
io i 20 0 io 15 20 


467 


STREPTOCOCCUS 
IMMUNIZATION 


444 392 


TETANUS IMMUNIZATION NO IMMUNIZATION 


Cuart 2. On each graph for horses receiving the indicated antigenic treatment and one 
untreated horse, the curves in descending order respectively represent (1) esterified cho- 
lesterol, (2) free cholesterol, and (3) one-tenth phospholipid concentration in plasma (in 
mg. per 100 cc.). Corresponding 90-per-cent normal range estimates are indicated. 


the same universe. In like manner it may be shown that, if x’ is drawn at ran- 
y / / 
dom from U’ and p;, denotes P(x’ < 2;-), then 
a7 
v+twt+i1)! [* 
Qo 


(8) Pr(o! < pir < 6") = ° vw! p’-q" -dp 


whereq = 1—p,0=k’ —1,w=n —k’,and0sq@ Ss ¢” $1. 
The probabilities in (4) and (8) are independent, obviously, whether L”’ is the 
same as U or not. Accordingly, these relations make possible an evaluation 





286 WILLIAM R. THOMPSON 


of P(p, < px) under the circumstances where repeated sampling is applied to 
both the case of S and to that of S’. With this understanding, then 


, ° 8 1)'(v + , Fs aaa 
(9) Plpe < per) = ete? e+ e+ | pi-aivdpn: | p’-q' -dp, 
0 I 


r!-st-v!-w! 


0 
where, as before,r =k —1,s=n—k,v=k’ —l,w=n' —k’,q =1-—-p, 
and q = 1 — po. 

In a previous paper [4] a ¥-function was defined as 


_—" -— 
(10) V(r, s, r’, s’) a r 8 


Pr 
r+s+l1 


for any four rational integers r, s, 7’, s’ = 0; and it was shown in detail that the 
right member of (9) is equal to (7, s, v, w); whence we may write 


(11) P(p, < per) = Vk — 1,n —k, kb’ —1,n' — Fk’). 


. . y y . / . . 
Obviously, if U’ and l” are the same universe, then p; < p, if and only if 
/ 
xr, < a, and then we have 


(12) P(x, < zy) = VK —1l,n—k,k’ —1,n' — Bk’) 


in repeated random sampling applied to both sample types, S and S’, respectively 
of n and of n’ observations. In the paper just mentioned, and in another [5] 
the W-function was further developed by extension of definition to include 
V(r, s, —1, s’) = 0, and it was shown that 


(13) Wr, s, 7’, s’) = V(r, 7’, s, 8’) = V(s', 7’, s, 7) = 1 — Vs, 7, 8’, 7’). 


Further demonstrations [5| included the relation, 


Baia sn.) 
_ ao Write 8—a 
r+str4+s'4+2 

( r+s+1 ) 
which offers another form for calculation. The identities in (13) are particularly 
useful to facilitate calculation where one of the four arguments is small. A 
system for forming a table has also been developed [4, 5] in an economical form, 
but tabulation has been given only for the arguments not exceeding 5. 

Now, in applying a test based on relation (12) or on that for the complemen- 
tary probability, P(x, < 2.) which obviously, by (13), equals ¥(n — k, k — 1, 
n’ — k’, k’ — 1), we may wish to exclude from the normal set of observations 
those values obtained from animals later given the treatment in question in the 
statistical significance test. The purpose would be to avoid violation of the 
condition of independent sampling required. In the case of the tetanus antigen 
treatment, we have an experience wherein 5 or more of 6 horses treated yield 


(14) V(r, 8, 7”, 8’) 


, 








BIOLOGICAL APPLICATIONS OF NORMAL RANGE 287 


values for a given plasma constituent less than the third in ascending order of 
magnitude (namely 23) in our independent set of normal values. Here n’ = 6, 
and n = 62 — 6 = 56. In accordance with the hypothesis that the treatment 
in question does not affect normal causes of variation in the plasma constituent 
under investigation we have P(x; < 23) is ¥(53, 2,6 — k’, k’ — 1). Thisis 
approximately 1.891(10) ° for k’ = 5, and 4.555(10)‘ for k’ = 6. Obviously, 
a rule for establishing the value of k to be used in such tests should be fixed in 
advance without prejudice, as in the present case where we have taken 
k =>gn+1)>k — 1 forg = 5%. 

In the case of streptococcus immunization treatment, the corresponding test 
would have n = 58, n’ = 4,k = 3, and k’ = 4, 3, or 2; which would yield ap- 
proximately 2.689(10) °, 1.031(10) *, or 1.817(10) ”*, respectively for P(az» < 2s). 
Thus it appears that where such values are found (intuitively it would 
appear a fortiori if we compare instead with 2x; of the entire normal set of 62 
values), their low magnitude appears to discredit the hypothesis that such dis- 
crepancies are ascribable to mere chance normal variation in the quantitative 
attribute investigated. 

The tests proposed are free from any assumption concerning the form of the 
original distribution f(x). The illustrative material is only a part of that pre- 
sented with similar statistical treatment in the paper of Wadsworth and Hyman 
[3], which makes it apparent that the tests suggested here may be useful and 
powerful in analysis of biological and other experimental data. From a similar 
point of view, Hotelling and Pabst [6] developed tests of bi-variate correlation, 
and Milton Friedman has elaborated a multi-variate rank analysis [7], the tests 
being likewise free from any assumption about the form of the original distribu- 
tions. In a previous paper [1] confidence ranges for the median are based 
similarly, employing relation (5) for the special case p = 3. 


Division OF LABORATORIES AND RESEARCH 
New York Strate DEPARTMENT OF HEALTH 
ALBANY, N. Y. 


REFERENCES 


[1] W. R. Tuoompson, Annals of Mathematical Statistics, Vol. 7 (1936), p. 122. 

[2] Tables of the Incomplete Beta-function, edited by Kar] Pearson, (Office of Biometrika, 
University College, London), 1934, p. 494. 

[3] Aucustus WADSWORTH AND L. W. Hyman, Jour. Immunol., Vol. 35 (1938), p. 55. 

[4] W. R. Tuompson, Biometrika, Vol. 25 (1933), p. 285. 

[5] W. R. Toompson, American Journal of Mathematics, Vol. 57 (1935), p. 450. 

[6] H. Horetiine anv M. R. Passt, Annals of Mathematical Statistics, Vol. 7 (1936), p. 29. 

[7] Mitton FrrepMAN, Jour. Amer. Stat. Assoc., Vol. 32 (1937), p. 675. 





THE COMPUTATION OF MOMENTS WITH THE USE 
OF CUMULATIVE TOTALS 


By Pau. S. Dwyer 


1. Introduction. Various authors have shown how the moments of a fre- 
quency distribution may be computed from cumulated frequencies.’ In order 
to make clear to the reader the type of technique under discussion there is 
presented an illustration which is, essentially, that used by Hardy, [2, p. 59]. 
The value =f, = 729 is the last entry in column 4. 

We use Cj to denote the entry in column 4 which is opposite the smallest 
variate (or class mark if the distribution is grouped). Similarly C} is the entry 
above Cj , and Ci the entry to the right of Ci, ete. In this notation the diagonal 
entries, the ones underscored in Table I, are c. cc. e. 

The moments’ about the smallest variate can be expressed in terms of the 
cumulations of Table I in different ways. One method utilizes the diagonal 
entries and the differences of zero. Thus 


6 


>, tf. = Cz = 2916; 


0 


6 
7 xf, a Cc? + 6C3 + 6Ci = 57996: 
0 


> xf, = Cz + 14C3 + 36C; + 24C3 = 278316, ete. 


0 


A second method utilizes the entries in the next to the last row and the differ- 
ences of zero. Thus 


6 6 


>, af2 = C2 = 2916; Dox f, = —Ci + 2C2 = 12636; 


0 0 


6 


> 2°f. = Ci — 6C2 + 6C2 = 57996; 


0 


— Cs + 14C3 — 36C3 + 24CS = 278316, etc. 


! The reader is referred to reference [1]... [15], at end of paper. 
2 It is to be noted that we are not talking about moments per unit frequency. We are 
using the term in the sense used for example by Whittaker and Robinson. See [20, p. 18]. 
288 





COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 289 


A third method, which seems to have escaped previous attention, involves 
columnar entries and multipliers whose determination and properties are a chief 
concern of this paper. Thus 

6 6 


Do the = C2= 2916; Di xf. =C2+C} = 12636: 


0 0 
6 


> 2. = Ci + 4C! + Ct = 57996: 


0 


6 
> xf, = C2 + 11C3 + 11C} + C3 = 278316, ete. 
0 
It is possible also to obtain formulas when the cumulations are made from the 
smallest variate to the largest variate and, indeed, the whole theory of the 
present paper could be duplicated with such a theory of cumulation. 


TABLE I 


Successive Frequency Cumulations 


(3) (4) (5) (6) 
F, a (2 C3 


64 5 64 

256 320 384 448 512 
496 816 1200 1648 | 2160 
656 1472 2672 4320 6480 
716 2188 4860 9180 | 15660 
728 2916 7776 16956 | 32616 
729 3645 11421 28377 | 60993 


It is possible to obtain the columnar formulas from the well known diagonal 
formulas. From the construction of Table I it is clear that 


(1) Ch = Chait Ct" 
so that 
2 = C3;C3 +203 =C2+C3; C2+6C3+6Ci = C2 + 4C03 + Ci; 
C + 14C3 + 36C} + 24C3 = C2 + 1103 + 1103 + C5. 


Formula (1) can be used similarly in deriving columnar formulas from row 
formulas, diagonal formulas from row formulas, etc. 

The columnar method is here recommended as a useful substitute for the 
usual elementary method of computing moments. The many multiplications 
involved in the usual process are replaced by continued addition. The chief 





290 PAUL S. DWYER 


disadvantage of the method is the continual recording, although this obstacle 
is surmounted with an adding machine equipped with a recording tape. The 
resulting moments are easily checked with an adaptation of Charlier’s check, 
as is shown in section 8, and methods are given by which the multipliers are 
easily obtained. The method is also well adapted to the use of Hollerith 
machines. 

The introduction of such columnar multipliers tends to give a different empha- 
sis to the cumulative totals technique. The use of diagonal entries led logically 
to an emphasis upon factorial moments, while the columnar method tends to 
emphasize the more familiar power moments. The primary application here 
indicated is not to elaborate and specialized techniques, but rather to the simple, 
though often tedious, problem of the computation of power moments. 

The aims of this paper are then: 

(1) To show how moments may be computed from the columnar values of the 

successive cumulations, 

(2) To discover the properties of the columnar multipliers, 

(3) To present a general theory for computation of moments using cumulative 

totals. 


2. The Basic Cumulative Theorem. The use of (1) is not satisfactory in 
getting precise formulas for the columnar multipliers so we derive the columnar 
cumulative theory directly from first principles. We first prove 

THEOREM I. Let x be any real number and let uz be a real function of x which is 
0 when x < aand when x > a + k and which is not infinite for x = a, a + 1, 
a+2,---,a+k. Letv,bea real function of x and v, , called range vz , a func- 
tion such that vy, = v, whenx = a,a+1,---,a+kand v, = 0 at all points 


a+k 
outside the rangeatoa+k. If >, u; is indicated by Cu, and vz — vz-1 by V vz, 
Vz — Uz-1 by V vz then 


atk at+k 


a+k 
(3) 2, Uv: = > Ud, = >, Cu,Voz. 
a a a 


The values uw, , v, , Cuz , Vv, are presented in Table II. 
The theorem is proved by forming 


a+k 


Le Cu,Vvz = Ua+kVatk + = + Ua+iVas+i + — + UaVa 
a 


at+k at+k 


zy Uz: = = Uz Vz. 
a 


a 


Theorem I can also be written as 


k k 
(4) 2. Ua +z Va+z = » a 
0 
















COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 






3. The Successive Cumulation Theorem. 


TueoreM II. If C’u, = C[Cuz] and V°v, = V(Vvz), etc., then 


at+k at+k at+k 
ys+1 +1 
2, Mets = >» tote = 2, Con, V2... 
a a a 
This theorem follows readily from Theorem I. If 


a = Cu, Fa —_ Vu: , 


atk a+k at+k atk a+k 


>» U,V; = Z, Ure = >» U,V.= bi CU,.VV, = ze C’u,V'v-. 


This process can be extended as many times as desired so that 












and then 









a+k a+k at+k 


st+1 1 
> sts = D5 Uete = 2, CULV 2. 
a 


a a 













TABLE II 


Values of x, Uz, Vz, Cuz, and Voz . 
x Ux Vz Cus Vv, 


a k Ua+k Va+k Ua+k | Va+k ai Va+k-1 







Uatk + Ua+k-1 | Va+k—-1 — Va+k-2 


6.66 + 06.60 06:0] OOOO 6 60 © 610 60 6 6 0 6] 8:8 84 6H SO SOOO OO OO 6 O00 686086664 OCO6 O08 169 6005000068400 80 88 


eee + Ua+i 








a + 1 Ua+1 Va+1 Ua+k + one + Ua+i + ae + Ua+1 Va+1 — Va 


Uark °° HF Uars $+ °°° H+ tay + Us| Ue 






This can also be written as 


k I: k 
. s+1 +1 
(6) Z Ua+2Va4e = » Ua+2Vat+ze = z Cc Uat2V Va+z- 
0 0 0 


In order to determine the values V*™' vez ,0 <2 <k, we note that 


7 s+ : s 1 
(7) V Date = X (-—1)' (* : ) torent 






so that 


(8) Ve = HD! (OT) bate 






292 PAUL S. DWYER 


We also know that, « < k 


whent <2 
(9) 


whent > x 
so that 


(10) Vv Pins _ > (—1)‘ (° : ') Va+z—t » 


0 


(11) Vv a = 7 (—1)' (° ') Va+az—t = Ps 
0 


The formula (6) can then be written 


a+k 


k 
(12) 2X Uzv; = — > UaizVatr = — ic tT aieV Oa + yh vy "the izV i Va+z2- 
s+1 


4. Moments from the Cumulated Frequencies. If way. = faz 
Varie = (a + 2x)’, then (6) gives 


(13) . (a+ 2)"fore = > ose + a. 


A more useful formula, obtained from (12), is 
k & 

(14) i (a > 2)" fare = z "4. We + ay, 
° 0 


since V"' (a + x)’ = 0. We have then 

THEOREM III. The values of the s-th moments can be obtained from the last 
s + 1 entries of the (s + 1)st cumulation of the frequencies. The multipliers are 
the values 


(15) va + x) = - (—1)' (° : 


') (a+2z-—?)*. 


Cor.1. Whena = 0,i.e., when the moments are measured about the smallest 
variate, the multipliers are 


2+ x 8 1 8 
(16) = (-0' Ys )@-o" 
Cor. 2. When a = 1, the multipliers are 
(17) ve +2 = > (-1) e ') (Q+2—0" 


Cor. 3. If the moments are measured about a fixed value, p, then the new 
* ‘ a aie s+1 
smallest variate is a — p = a’ and the multipliers are V°" (a’ + 2)*. 





COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 293 


Cor. 4. If pis the mean, m, then a’ = a — m. If in addition a = 0, then 
a’ = —mand the multipliers giving moments about the mean are V*"' (x — m)’. 
Now 


k 
X fe _ Civ04+Civ'l C3 
k = 1 
C 
a 
0 


” & 
It follows that the multipliers giving the moments about the mean are 
yr (« eae 4) 
et 


It is to be noted that the moments about different points are obtained by 
applying different multipliers to the same cumulated frequencies. 


(18) 


5. Values of the multipliers. The values of the multipliers may be computed 
from (15). Thus V°(a + 1)? = (a + 1) — 3a? = —20° + 2a +1. This be- 
comes 2ab + 1 when 1 — ais set equal to b. Values of the multipliers for the 
most common values of s and x are presented in Table ITI. 


TABLE III 
Values of 9**! (a + 2)° 


Ys 3 


b4 
b® 4b’a + 6b? + 4b + 1 
3b7a + 36+ 1 


be | Gab? + 12ab + 11 


2ab + 1 3a°b + 3a +1, 4a*°b + 6a? + 4a 4+ 1 





a’ a’ a‘ 

When a = 0, b = 1 and the multipliers are 1; 0, 1, 1; 0, 1, 4, 1; 0, 1, 11, 11, 1; 
etc. as indicated in section 1. When a = 1, b = 0 and the multipliers are 1; 
1,0; 1, 4, 1, 0; 1, 11, 11, 1, 0; ete. When the moments are measured about a 
fixed point, p, it is only necessary to compute a’ = a — p and to use a’ fora 
and b’ = 1 — a’ for b in Table III. 

We illustrate the use of the multipliers by application to the problem of 
Table I. 


6 


The moments, when a = 1 are he (¢ + If, = Ci = 


0 


The moments about the smallest variate are computed in section 1. 


6 


3654; , is (c+ 1)°f. = 


0 





294 PAUL S. DWYER 


6 6 
+ C2 = 19197; Do (x + 1)*f, = C2 + 4C3 + Ci = 105381; Do (2 + 1)*f, = 
0 0 
C2 + 11C3 + 1107 + C3 = 598509. 
v2 2916 
The moments about the mean are found by forming es = car 
725 


= —4 and the multipliers are 1; —4, 5; 16, —39, 25; —64, 229, —284, 125; 
6 6 6 


256, —1199, 2171, — 1829, 625; ete. so that >> #f, = 0; >> #f. = 972; > Pf, = 
0 0 0 


= 4. Thena 


6 
-4 . 
—324; >> af, = 3564. 
0 
: . s+1/ y2 yl\s : : » cyl y2 ¢ 
Since the values of V°" (2 — C2/C\)° are expressible in terms of Cj and Ca, it 
k 
follows that the values of >> 2*f, are expressible in terms of cumulations. For 
0 


example a formula for the second moment about the mean, which is essentially 
one given by Whittaker and Robinson [7, p. 193] is 


a+k oe . ‘ (co) 
(19) DL Ph. = Cz + 203 — 

a Ci 
However the general method described above, supplemented with the tech- 


niques of succeeding sections, is preferred to the development and use of such 
formulas. 


6. Recursion Property of the Multipliers. It is not readily apparent from 
Table III how the multipliers of the (s + 1)-th cumulations can be obtained 
from the multipliers of the s-th cumulations. It is possible to establish a re- 
cursion formula which is useful for this purpose. Now, a < x < s, 


va+2) = (+2) 4+ > (—1)' (° : ') (a+2—i)’ 


(a + x)V*(a + s)"" (a+-a2)°+ > (—1)' @) (a+a2—t)* "(a+ 2) 


(s+1l—-a-— x)V'(a + 2-1) 


- > (-1)"" (, " :) (a +2 —d)'“(e¢+1-a-=2) 
a vo 


and since 


(‘)ata-(*)or1-a-n=(t)ate-0 


it follows that 


(20) A (a+2) =(atavate'+(s+1—a—a2)V(a+e2—1)". 






















COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 





When a = 0 we have 
(21) Ve = aV'x"* + (8 +1 — 2)V"(@ — 1)". 


Formulas (20) and (21), though somewhat formidable in appearance, are easy 
to apply. Thus V°(a + 2) = (a + 2)vV'(a + 2) + (1 — a)V*(a + 1). The 
recursion formula is especially useful in building up tables of multipliers. The 
following form is recommended: 

As successive columnar headings use the values a, a + 1, a + 2, ete. and as 
successive row headings use 1 — a, 2 — a, 3 — a, etc. Then Va = 1 is placed 
in the upper left cell, V’a directly below Va°, V’a + 1 to the right of V’a’, ete. 
The values of V°(a + x)’ are placed in the next diagonal, etc. If this process is 
continued the entry V°"'(a + x)’ will have the entry V‘(a + x)* “ directly above 
it and the entry V‘(a + x — 1)*" on its left. Also the columnar heading is 
a + x and the row heading s + 1 — a — 2 s0 that any entry is obtained by 
adding the product of the entry above it and the columnar heading to the 
product of the entry to the left and the row heading. The values of V°"'2’ 
are obtained by placinga = 0. They are presented, in Table IV, through s = 8. 














TABLE IV 
Values of V**'x* 











L 
| 
—<— 
37 
3 0 1 | 66) 302) 1191 4293 / 
4 0/1 | 2% 302) m6) i699) 
5 |o|1] 57 | 191 fissi9, | | | 


The table is easily extended to higher values of s. If a table of values of V°** 
(x + 1)’ is constructed, it will be found to be like Table IV with columns and 


rows interchanged. Hence the values of wr" + 1)‘ are obtained from Table 


296 PAUL S. DWYER 


IV by reading the multipliers down the diagonal. Thus the values V°(z + 1)? 
are 1, 4, 1, 0, ete. 
The ease with which the multipliers may be computed is illustrated with 
= —4. In this case we have 


TABLE \ 
Values of V8"! (x + a)* witha = —4 


125 
—39 — 1829 
229 / 


— 1199 


These values agree with those computed more laboriously in section 5. 


‘ 
7. Value of >) V'"'(x + a)’. It is to be noted in Tables III, IV, V that the 
0 


sum of the entries in the diagonal having s + 1 terms is s! This is generally 
true and results from the fact that 


(22) >. V2 +a)* = > Vv (¢ + a)* = 
0 0 i 


k k 
In obtaining the values of Zz V' "(a2 + a)*™ from the value of Dov (x + a)’ 
0 0 a 


it is noted that V°"'(2 + a)" is used but twice. Once it is multiplied by a + z 
and once by s + 1 — a — xso that the net result is a multiplication by s + 1. 


It follows that dv "(a +a)" =(s+1) > v’ (x + a)* and since £ Vv ‘(x +: 1) 


= 1, x V(x + a)” = 2!so that in general > V2 +a) =s! 
0 
This property is useful in checking the v ins of the computed multipliers. 
8. The adaptation of the Charlier check. An adaptation of the Charlier 


check serves as an excellent check for the computed moments. It is recalled 
that the Charlier check gives 


(23) > (z + 1)"f; = oc > ye x 


0 z= 










Kv 


COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 297 






The components of the right hand member are computed by cumulative totals 
as indicated above. The left hand member is obtained by applying different 
at+k k 


multipliers to the same cumulated frequencies. Thus >, (x + 1)'f.= D(@ta 
a 0 


+ 1)*f2;q¢ and the multipliers of the cumulated frequencies are V°"'(x + a’)’ 
where a’ = a + 1. 





If a = 0 the Charlier check multipliers are the values 


















6 
v*"'(2 + 1)* which can be read from Table IV. For example 7 (x + 1)‘f. = 
0 
6 6 
C} + 11C3 + 1103 + C} = 598509 and this checks with >) 2‘f, + 4 >> 2’f, + 
0 0 


6 af + 4D af + Life. 


9. Application to factorial moments. When wu, = f:, v2 = x = a(x — 1) 
a a ae 
k k 
Zz a” f. aaa z ss 
0 0 





e st+l1 (2) - ° ° 
and since V"''2“’ is 0 when s < x < k, is s! when s = z, is 0 when 0 < x < sg, 


k 


(24) dX x f, = >» a” f, = 


8 


s+l 
s! C341 ° 


It follows that the underscored terms of Table I, when multiplied by s!, give 
the factorial moments. Factorial moments, first used by Sheppard [4], have 
since come into prominence largely because of this ease of computation. 


rh . r _ 1 x” 
rhe coefficients of (a + b)” are 1, z, == , feat , 
° 8s. 


k (s) k 
; . x - 1 (8) wt 
the binomial moment by B, = Zz. , fz [6, p. 278] then B, = | > ft, = CFG. 
0 8s. Ss: 0 


,-::. If we define 


It is also possible to show that the entries under the main diagonal are bi- 
nomial moments. In Table I, for example, we let a = 1 and add the additional 
row a = 0 with 0 frequency. Then Cj = 729, C2 = 729, Ci] = 729 + 3645 = 


4374, etc. The new diagonal terms are directly under the old diagonal terms and 
7 6 


give Baa = Dov fe = D (e@ + 1S, . 


1 0 
rows below the terms B, and the factorial moments are s! B,.;. 


In general the terms B,,, are given I 


Then 


(25) Fea = s!Ciph-2. 
y - 

For example in the problem of Table I, Fs,3 = » tf, = 4!C3 = 782,784. The 
3 


method is especially adapted to the use of Hollerith machines, for positive 
integral values of 1, since it is only necessary to have the machine continue its 
cumulation. 





298 PAUL S. DWYER 
















10. The cumulations of zf,. It is possible to use the cumulations of zf, in 
securing the values of the moments. Now 
atk k k 
Def, = » (a + a) fora = DE (@ + a)fera(a + a)° 
a 0 
(26) 
= Ca + a)feraV (a + a)*. 
: aa 


When a = 0, (26) becomes 


k 
(27) 2, rs, = >> "ia af. yo x. 
0 0 


° We compute the cumulations of af for the problem of Table I. These are given 
in Table VI. 





TABLE VI 
Cumulations of xf: 


x /, af. C1 (2 C3 (4 


6 64 384 384 384 | 384 384 
5 192 960 1344 1728 2112 2496 
4 240 960 2304 4032 6144 | 8640 
3 160 480 2784 6816 12960 | 21600 
2 60 120 2904 9720 22680 | 44280 
1 12 12 2916 12636 35316 | 79596 
0 1 2916 15552 50868 | 130464 
so that 





6 6 6 
> tf: = 2916; Divs, = 12636; Do 2°f, = 35316 + 22680 = 57996; 
0 0 0 
6 


> fe = 79596 + 4(44280) + 21600 = 278316. 


0 









In getting moments about the mean from the cumulations of af, , the follow- 
ing method is recommended. 








k k k k 
(28) .¥ #" Ys, = >» #(x — m)f, = z E’af,—m Zz #° fe. 
0 0 0 0 
and 
k k 
(29) Zz I’2f, = ; ® CC (af,)0°* (a — m)’. 
0 0 


k 


k k 
When s = 1, (28) gives >. #f, = >. taf. —m > ff, and 
0 


0 0 
























j= 


id 


COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 





k k 


(30) Def. = Dd ef.. 


0 0 









In the illustrative problem a = —4 so that 


- af. 


—4(15552) + 5(12636) = 972 


& 


16(50868) — 39(35316) + 25(22680) = 3564 


” af 
* af. 


= 2268 


6 
Ee 
0 
6 
pm 
0 

and 


6 
Df. = 972; Di# sf, = 3564 — 4(972) = —324; > z*f, = 3564. 


0 


k 
Formula (30) is of note since it permits the determination of >> #f, directly from 
0 


the cumulations of af, . 
The factorial moments are also related to the eumulations of 2f,. Thus 


(31) » a” f, si > (x _ 1)“ af, is : C*(af2)V'(e — 1)" 


k 
which results in >, xf, = (s — 1)!C%(af,). 
0 


It follows that 
Ci(afe) = sCiti(f.). 


For example, the underscored terms of Table VI are respectively 1, 2, 3, 4 times 
underscored terms of Table I. 

In general the cumulations of af, , rather than of f, , are recommended since 
C(af.) can be computed and recorded almost as quickly as C(f,), since one less 
cumulation is needed to obtain a specific moment, and since the multipliers 
needed to get a specific moment are smaller. A technique based on the cumu- 
lations of zf, is especially adapted to the use of Hollerith machines. Let us take 
x, to represent the sum of the z’s for all items in the distribution having the 
same value of x. Then af, = x, and we have 


a+k a+k 


a+k 
aon —1 
(32) Lie'fe= Liat, = De C'a,)V'(a"). 
a a a ee ne 
If the cards are sorted for x and the tabulator is wired to print cumulative totals 
each time x changes, the recording tape gives the successive values of C(z,). 
(Care must be taken that there are no blank values of 2.) 
If a summary punch is available, these cumulations are punched on cards as 


300 PAUL S. DWYER 


they are cumulated and these summary cards are used in getting higher cumu- 
lations. 

If no summary punch is available, it is possible to obtain )> 2°f, by the applica- 
tion of Theorem I. Thus 


atk atk 


Daf. - Ye. = Leer, 


at+k 


and since V(x) = a when x = aand V(x) = 1 whenz > a, it follows that , ® xf, 


can be obtained by adding the entries above the last and then adding the last 
entry multiplied by a. This is essentially the Mendenhall-Warren-Hollerith 
method of getting » af. [9, p. 27]. 

In case a = 0 the technique amounts simply to adding all the entries above 
the bottom one. 

The value >> af, can be obtained similarly from the first order cumulations. 
Thus 


atk 


(33) Df: = = uit, = : C(x,)V (2°) 


9 


and since V(2°) = a’ when x = a, V(2") = 2x — 1 when z > a, it follows that 


at+k at+k 


(34) >, 2 f- = a*Cl(z,) + p> C(a,)(2x — 1). 


When a = 0, (34) becomes 


k 


(35) Lz ats C(x,)(2x — 1) 


0 1 


so that the multipliers are the successive odd integers. Thus from the first 
order cumulations of Table VI we have 


6 6 6 


> 2f, = 2916; Daf, = 12636; > 2°f, = 57996. 
0 


0 0 


The cumulative method can also be applied to the method of digiting [17, 
p. 425]. 

It is also possible to obtain the moments from the cumulations of 2’, , 2°f., 
etc., since 


at+k 


a+k a+k 
7 aft ¢2 ae 2 ’ xf, p » Co A? fv +178) 


at+k at+k 


a+k 
Zz ti de x ws Cc’ a? f,)0°* (2°) 


but the cumulations of af, are preferable for most purposes. The Charlier check 
works in all cases. It should be noted that the indicated Hollerith technique 
















COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 301 


demands only the customary tabulator and not the expensive, time consuming, 
ard punching, multiplier, [16]. 


11. Product Moments. Correlation. it is possible to apply the cumulative 
technique in getting product moments involving two variables. If we let y, 
be the sum of all the values of y having the same value of x, then 


a+h 


(36) Do" fey = Le yer® = DC y)V(e’) 
e a 

so that the multipliers are the same as those previously used. When Hol- 
lerith machines are used, it is only necessary to sort the cards for x and to wire 
the machine to give cumulations on variables z, y, z, ete. If the machine is 
adjusted to take totals with each change in x, the tape records simultaneously 
the values of C(x), C(y.), C(zz), ete. With a summary punch it is possible to 
form successive cumulations easily. The values S2°*’, Sa*y, =a*z, ete. are then 
obtained by applying the multipliers. When s = 1, (36) becomes 


atk 


(37) Dd tyfey = LD C*(y.)V*(2) 


so that the multipliers are a, 1 — a, 0, 0, etc. Whena = 0, the multipliers are 
0, 1, 0, 0, ete. and when a = 1, they are 1, 0, 0, etc. 

When no summary punch is available, it is necessary to obtain the values of 
the moments from the first order cumulations. Using Theorem I 


a+s 


(38) > rufey = 2 Cly)V(z) = aCi(y.) + D C(y,). 


a+l1 


This formula serves as the basis of the Mendenhall-Warren-Hollerith Correlation 
Method, [9, p. 27]. 


It can be shown in similar fashion that 











ats 


(39) DD ef = PCL + DY Cly.) (2x — 1) 


a+l1 


and when a = 0 


(40) > Wf. = . C(y,) (22 — 1). 


The method is also adapted to the common problem of finding correlation 
coefficients from grouped data when Hollerith machines are not available and 
this method is recommended for the determination of these coefficients. 

An illustration is presented in Table VII which shows the correlation existing 
between college first semester average, X, and preparatory school average, Y, for 
1126 students entering the College of Literature, Science and the Arts of the 
University of Michigan in 1928. The coded values of X and Y are indicated by 
x and y and are positive integers beginning with 0. The coded values are given 





302 PAUL S. DWYER 


in descending order beginning with the upper left hand corner of the chart. 
The values of the cumulations are placed at the right hand side and at the bot- 
tom of the chart. 


TABLE VII 


Correlation with cumulative totals 


(6) | (7) | (8) (9) | QO) | G41) 2), 13) | (14) | 
3.99 3.49 2.99 2.49 1.99 1.49 | .99 
'3.50-/3.00-'2.50-2.00-1.50-1.00-| .50- 


7; 6 5 4 3 


61 259 661 1330, 2194, 2578 2923] 2993 12815) 


104 | 454 | 1096 2196, 3560, 4097 4399] 4399, 20245 


The lower right hand corner has the entries 


Dz Ly 
¥ 4 y “ = ip | where >» xy; a, Y, and >» x are obtained by adding 
: / the-cumulations in the columns or rows involved. 
> x z x ZS = y| 


The values C(y,) are easily computed from columns (2) and (3). The values 
of C(z,) are computed by forming the cumulated product of the row frequency 
and xz. The values are recorded when the products contributed by a given row 
have been computed. The values C(y,) and C(z,) are obtained similarly. 

The value of r is easily obtained from the lower right hand entries. The value 

> > Sioa) Sc : : ies as, ae yyy 
Az,y = Nay — (Zx)(Zy) is obtained from diagonal entries, A,,, = N2a° — (=z) 





COMPUTATION OF MOMENTS WITH COMULATIVE TOTALS 303 


is obtained from entries in the last row, Ay, = NZy*’ — (Zy)’ is obtained from 


A, 
the last column, and r = —7=—== is easily computed. 
V AzzAyy 
ry = .441. 


The values M, , M,, o,, o, are also easily obtained from the lower right hand 
entries. ‘The successive steps are indicated by the form 


In the above problem 





Recent methods of applying cumulative totals theory to correlation are found in 
references [9], [14], [17], [18], [19]. 

The third order moments are obtained by multiplying the entries of C(z,), 
C(yy), C(xz), C(yz) by 1, 3, 5, ete. as indicated by (40). Thus 22°f, = 4399 + 3 
(4339)+ ete. = 102, 103; Sa°yfr, = 63121; Vry"fe, = 46047; Dy'f, = 38,633. 
It is hence possible to compute the skewness of each marginal distribution from 
Table VII. See also [18, p. 657]. 


12. Conclusion. This paper presents an outline of the computation of 
moments with the use of cumulative totals and columnar multipliers. Basic 
general theorems are derived and applications are made to one variable and two 
variable distributions both with and without punched card equipment. The 
formulas assume that the distance between successive variates (or class marks) 
is unity, but the reader should have no trouble in adapting the formulas to more 
general problems. 

In the interest of brevity the development is limited to the descending cumu- 
lations. It is possible to parallel the development here by deriving formulas in 
terms of ascending cumulations. It is also possible to work out formulas show- 
ing relations between columnar, row, and diagonal multipliers. There are 


A 
other applications such as to the evaluation of > 2", which are of interest. It is 
1 


possible also that applications may be found for the general theory of sections 
2 and 3 which do not demand that v, be a power function. 


THe UNIVERSITY OF MICHIGAN. 





304 PAUL S. DWYER 


REFERENCES 


[1] G. F. Lipps, ‘‘Die Theorie der Kollectivgegenstiinde,’’ Philosophische Studien (Wundt 
Editor), Vol. 17, (1901) pp. 467-575. 

[2] G. F. Harpy, Theory of the Construction of Tables of Mortality, pp. 59-62 and 124-128. 

[3] W. P. EtpEerton, Frequency Curves and Correlation, pp. 19-23. 

[4] W. F. SHepparp, ‘‘Factorial Moments in Terms of Sums or Differences,’’ Proc. of 
London Math. Society, 2, Vol. 13, (1913) pp. 81-96. 

Also, ‘‘Fitting Polynomials by the Method of Least Squares,”’ ibid. pp. 97-108. 

[5] J. SreFFENSEN, ‘Factorial Moments and Discontinuous Frequency Functions,” 
Skandinavisk Aktuarietidskrift, 6, (1923) pp. 73-89. 

[6] J. SrerFENSEN, Interpolation, pp. 93-104. 

[7] E. T. Wairraker anv G. Rosinson, Calculus of Observations, pp. 191-194. 

[8] R. Frisch, ‘‘Sur le calcul numérique des moments ordinaires et des moments composé¢s 
d’une distribution statistique,’’ Skandinavisk Aktuarietidskrift, Vol. 10, (1927) 
pp. 81-91. 

[9] R. M. MENDENHALL AND R. WarkEN, “The Mendenhall-Warren-Hollerith Correlation 
Method,”’ Columbia University Statistical Bureau Document No. 1.1929, Columbia 
University, New York, 43 pp. 

[10] R. M. MeNDENHALL AND R. WarrREN, “Computing Statistical Coefficients from 
Punched Cards,” Jour. of Ed. Psy., Vol. 21, (1930) pp. 53-62. 

[11] C. Jorpan, ‘‘Approximation and Graduation According to the Principle of Least 
Squares by Orthogonal Polynomials,’’ Annals of Math. Stat., 3, (1932) pp. 
257-358. 

[12] A. C. Arrxin, “On the Graduation of Data by the Orthogonal Polynomials of Least 
Squares,’’ Proc. of Roy. Soc. of Edin., Vol. 53, pp. 54-78. 

[13] A. C. Arrxin, “On Fitting Polynomials to Weighted Data by Least Squares,”’ Proc. 
of Roy. Soc. of Edin., Vol. 54, (1933-34) pp. 1-11. 

[14] Cuen-nan Li, “Summation Method of Fitting Parabolic Curves and Calculating 
Linear and Curvilinear Coefficients on a Seatter Diagram,” Jowr. of Am. Stat. 
Assn., 29, (1934) pp. 405-409. 

[15] M. Sasuty, T'rend Analysis of Statistics, Chap. VIII. Also page 5. 

[16] H. C. Carver, ‘‘Uses of the Automatic Multiplying Punch’’; Punched Card Method in 
Colleges and Universities, pp. 417-422. 

[17] A. FE. Branpt, ‘Uses of the Progressive Digit Method’; Punched Card Method in 
Colleges and Universities, pp. 423-436. 

[18] P.S. Dwyer anp A. D. Meacuam, “‘The Preparation of Correlation Tables on a Tabu- 
lator Equipped with Digit Selection,’’ Jowr. Am. Stat. Assn., Vol. 32, (1937) 
pp. 654-662. 

[19] W. N. Durost ano H. M. Wacker, Durost-Walker Correlation Chart World Book Co. 
N. Y. (1938). 

{20] H. L. Rrerz, Mathematical Statistics, 1927. 










A NOTE ON THE DERIVATION OF FORMULAE FOR MULTIPLE 
AND PARTIAL CORRELATION* 


By Lovuts GuTTMaN 








1. Multiple Correlation. [.et the measurements of N individuals on each 
of the n variables 2, %2, +++ ,a%, +++ ,2%n, be expressed as relative deviates; 
that is, such that 


_ * 2 T 
e =r, = 0, =a, = N, k= 1,2,3,---,m 


’ 








where the summations extend over the N individuals. 
If values of \; are determined so that 


D(a. — Nee — AgZs — +--+ — Anda) iS a Minimum, 
and if we let 
(1) Xi = Nowe + 32's + =F + Antn ’ 


then the multiple correlation coefficient, obtained from the regression of 2, on 
the remaining n — 1 variables, is defined as 







T1.234...1 = Y2,Xy . 








The square of the standard error of estimate of x2, on the remaining n — 1 vari- 
ables is defined as 


9 


6 1 ‘ , 2 
0 1.934---n = N D(a = N,) ‘ 


The minimizing values for A, are obtained from the normal equations 
(2) Z(a, — X1)x%, = 0, bk =23,---,%. 
which may be written in expanded notation as, 
No + Magd3 + Teady +--+ + Pande = 
Ts2h2 + As + rsada + +++ + T3ndn = 


Tnode -+- Tn3r3 “fF nang -+- -+- 















lw 
where ry, = = VxjT. = Tj, 7; = 1. 


N 












* The notions involved in this demonstration are certainly well-known. However, 
the directness and simplicity of the derivations may lend some merit to their exhibition. 
The writer is indebted to Professor Dunham Jackson for useful advice. 


305 


306 LOUIS GUTTMAN 


From Cramer’s rule it is seen that 
if k#¥1, Ru +0, 


where R ;, is the cofactor of rj (or of r,;) in the symmetric determinant, 


\my 7 am *** Te 


lni Tn2 
Summing both sides of (1) over the N individuals shows that 2X, = 0, so 


that the variance of X, is > 


l oye 
Oxy, =; DXi. 


N 


From (2), the residual (2, — X;) is orthogonal to each of the x, except 2 ; 
therefore the residual is orthogonal to any linear combination of these x, and 
in particular to X, ; that is, 


(3) Y(a, — Xi)X, = 0, 


or 


and therefore 


(4) Yr,X, = Ox, e 
TV 


Multiplying both sides of (1) by Vv 


and summing over the individuals, we get: 


Ox, lx, X, = ride + ri3 3 + rere + Tin Nn 


] 
- Ri (reRie + rishi + +++ + rin Rin) 
rv 


| — ‘ 
Ru 
From (4) then, 


o R 


11.934... =1]— — . 
Ry 
It is clear that in general 


R 
Ree 


2 
k.198---,k-1,h41,---0 = 1 — 





MULTIPLE AND PARTIAL CORRELATION 


To find the standard error of estimate, expand 


1 - 
N 2(a, — X;)’ I — Son, fae, + ox, 
2 
1- Tz,X, 
&s 
Ry 
In general, when o; = 1, 


R 
(5) tiie: etd = R, : 


kk 
2. Partial Correlation. If values of yu, and », are determined so that 
D(a — wsts — wats — +--+ — pata)’ is a minimum 
and Z(xe — vs%3 — M44 — --- — v,2n) is a minimum, 
and if we let 
Yu = pst3 + pasty +--+ + pdr 
Yo = vst3 + mtg +--+ + Vndn y 


then the partial correlation coefficient between 2; and 22 , holding the remaining 
n — 2 variables constant, is defined as 


(6) 


112.34---n = T(2,—-¥})(zo—¥9) 5 


and since 2(z, — Y;,) = 0, 


1 : 2 
N (a1 — Yi)(ae — Yo) 
112.34.-., = = ; 


01.24. --nO2.34--+n 


Each yp, is the negative of the ratio of the cofactor of ry, to the cofactor of ry 
in the determinant obtained by striking out the second row and the second 
column from R. We shall use the notation R,;-;, to mean the algebraic com- 
plement of the second order minor in R, whose complement is obtained by 
striking out row h and column 7 and then row7 and column k. Then 


— Reorx 


Me = ; . 
Roe 
By argument similar to that used in (3), 


D(x _ Y,)Y2 = 0, 


YnYe = TViYo. 





308 LOUIS GUTTMAN 
Similarly, 
LreY 1 = ZY¥iY2 ° 


lhen the numerator of the right member of (7) becomes, after expanding and 
collecting terms, 


(8) V2 — Oy, Troy; - 


Xe 


Multiplying both sides of (6) by N 


and summing over the N individuals, we have, 


Oy, Troy, = 123M3 + roams + e+ + Penbn 


] 


(7'23 Roe-13 + rey Roo1 + +++ + Ten Ro»-1n) 
Roo-11 


Rie 
= Te + . 
Re-11 
Analogous to (5), we have, 


2 Ro» 2 R 
(10) Cis... = Reoa,’ 62.34...1 = el 
From (8), (9), and (10) the right member of (7) becomes 
— Ri 
V Ru Ros" 

It is seen that in general 

— Rix 
Nae Gin 


Vjk.12-.- 


UNIVERSITY OF MINNESOTA. 





NOTE ON REGRESSION FUNCTIONS IN THE CASE OF THREE 
SECOND ORDER RANDOM VARIABLES 


By Criype A. BRIDGER 


The study of the correlation of two second-order random variables has re- 
ceived the attention of several authors, among them Yule [1], Charlier [2], 
Wicksell [8, 4], and Tschuprow [5]. Yule writes of them under the guise of 
“attributes.”’ The study of three or more second order random variables has 
lagged behind. In this note we shall examine the regression function of one 
second order random variable on two others by considering the problem from 
the point of view of Tschuprow’s [6] paper on the correlation of three random 
variables. 

A variable X that takes on m values 2, --- , 2m With corresponding prob- 
abilities pi , --- , Pm Subject to the condition x pi = 1 is defined as a random 

i 


variable of order m. (In particular, if X takes on only two values, x and 2’ 
with probabilities p and q, where p + q = 1, X is a random variable of second 
order.) The system of values x and probabilities p constitute the law of distri- 
bution of X. In the case of two random variables, X and Y, there exists a 
joint distribution law, covering all possible combinations of X and Y, together 
with their associated probabilities pu, --- , Pmn the joint distribution law con- 
tains all of the information regarding the stochastical dependence of X and Y. 

The extension to more than two variables is immediate. Let p;j, represent 
the probability of the simultaneous occurrence of the set of values x; , yj, 2 
of three random variables X, Y, and Z; p;; , that of the simultaneous occurrence 
of x; , y; together without reference to Z; p; , that of the occurrence of x; without 
reference to Y or Z;ete. Then, we have relationships of the types dX X dX Dijk 


=L Lp = Um = 1; Le Pi = Pits Le Le Disk = E pa = Dpa= me. 


Sani wale, let pj’ be the nodebiliiy of en simultaneous: occurrence of y; and 
z, on the condition that X takes on the value 2; py” , that of the occurrence of 
y; without reference to Z, on the same condition; etc. Then 


dpi = Lipt”? = X 2 pit = 1; Le pi = pis Pipl” = ass 
C 7 


(7) (i) (i) (49) . (4) - 
PiPe = PiPjik = PiPj Pk = Piik; he Pip; = pj; ete. 
u 


Denoting by E(x) or simply Ex the expression “the mean value or maathe- 
matical expectation of x,” we have mys,, = EX’Y°Z" ~ iy X a Diet Yj" 2k. 


In particular, the mean values of the distributions are given i mx = EX 
309 











310 CLYDE A. BRIDGER 





*~ - Piti,mMy = E “2 Pi¥i, Mz = EZ = x pry. Then we may write 

Myon = E(X — mx)'(Y — wi (Z — mz)" = Eu'v’w" => Zz Zz Dijx(Xi — mx)! 
a 7 k 

(y; — my)°(z, — mz)". The quantities » may be identified as terms in the 


expression for the moments for the sum of three variables as follows: E(u + v 
+ w)" = Eu” + nEu" ov + .-- + kEv’v’w* + --- + Ew’, where f+g+h 
=n. If n = 2, we have the variance of the sum of three variables given by 
Me.. + Qu. + we. + Zur + Zui. + w..2, where the dots in the subscripts indi- 
cate variables not considered. Thus us.. refers to the second moment of the 
distribution of the variable X about its mean, mx , without consideration of the 
distributions of Y or Z. If every term of the expansion of the n-th moment of 
the sum of three variables is divided by the quantity +~/u../y.2.%u..9", the expan- 
sion takes the “normal form.” The type term is rygn = pygn/W pe.‘ p.2.%u..9". 
In the case of one variable, ry = y;/+/ pe’, Som, = 0, re = 1, 73 = WBi, 1 = Be, 
etc. In the case of two variables, 7;. = r. = 0, re. = re = 1, 7, = Pearson’s 
product-moment coefficient of correlation, etc. Functions of parameters r will 
serve to characterize the law of correlation among the variables. 

By writing the expressions with superscript (i) to denote that the values of 
the distributions of Y and Z are those which correspond to a fixed value z; 
of the distribution of X, we have my” = (EY), mz = (EZ), wor‘? = E(Y 

my) times (Z — mz”), rox = pea? /W oe. (Forg = h = 1, 14° 
ne the conditional coefficient of correlation between Y and Z for X = 2;.) 
Thus it follows that we can study the correlation between Y and Z for each 
value of X separately. 

For second order random variables, some changes in notation can be made. 
Let pz and p, be the probabilities corresponding to the values x and 2’, respec- 
tively, of X; p, and p,- correspond to y and y’, respectively; p, and pz, correspond 
to z and 2’, respectively. Also, let p., represent the probability of the simul- 
taneous occurrence of x and y together without reference to the distribution 
of Z, etc., and pz, represent the probability of the simultaneous occurrence of 
all three values, x, y, 2, of their respective distributions, etc. Then, p. + pz 
= Py + Py = pz + Pe = 15 Pry + Pay = Dz 3 Pryz + Dayz? + Pzyr2 1+ Pry'z’ 
= Pz ; ete. 

Let us set up a system of er coordinates in which the values U’; — 





the U-axis are defined by U; —_* those along the V-axis by V; eee ; 
“ve He. pa. 
and those along the W-axis by W;, = Cet. tab represent the mean 


V be 


of the set of values of the Z distribution which correspond to the fixed pair of 


values, (x; , y;), of the X and Y distributions. Then, in the new coordinate 
(ij) 
‘ ‘ ‘ (43 Mz — Mz a " 
system, the same thing is given by My°” = - . Now, the series of 


Vina ) 
values My‘” obtained by giving 7 and j different values for the pair (U;, Vj) 
determine what is called the regression function of W on U and V (or, in the 











REGRESSION FUNCTIONS FOR SECOND ORDER RANDOM VARIABLES 311 


original notation, the surface of regression of the distribution of Z on the distri- 
(ij) i 
butions of X and Y). Similarly, the values of [My] = 5 — ma) obtained 
V p.-2 

by fixing l’ and varying V in the set (U;, V,;) determine what is called the 
conditional line of regression of W on V for a fixed value of U. With these 
definitions we shall consider the problem of finding a regression function of W 
on U and V for three second order random variables. 

For convenience, write 62, = Pry — PzPy, 522 = DP2z — PrPz, Sy2 = Dyz — Py Pz; 
Aye = PzDPsyz — Pry — Pre, €2 = Dzyz — PzyDs, Bye = Pz2'Pz'ye — Pz'yPz's, 92yz = 
€2 — PySrz2 — Prby2 = €y — Prdyz — Pzedry = €s — PySzz — Pzdzy. Direct substi- 
tutions into the several formulas developed above then gives us the represent- 
ative forms to be used in subsequent calculations: 


r— My = pr(x — 2’), 2 — my = —p,(x — 2’). 

7 Pz’ — Pz 
WV Dz Dez! 

] = 


— 3, ru. ee Yo1. = 11.73.., 
Pz Pr' 


V Dz Pz’ Py Py’ 


T.7.3., Trg. = Ty.7.4., Tee. = Ty.13..7.3. + a 


’ ye 
Mr = Det + Pet, 1.. = 0, 7e.. = 1, 1... 


= ri. (Yee. = 1), You = 73..Tur + Pen, 


r.3-Tin tia, Tue = %..3Tim + 7ru., 


Oryz i _ Pz’ ls —Dz 


’ 


ee ’ 
V Dz Pz’ 
€z M (12-) Ors — 
Ww = - 


Pry V 0: D2" Pry’ V 02 Dz’ 


V Dz Pz’ Py Py’ Dz D2’ Dede’ 


byz — € M (22-) &- bxz — byz 
; Ww oa“ Le , 
Pr'y’ V De D2’ 


Pr'y V De Dz’ 


[My] = . [My }e” a Byz 
Pry Pzz Pzz’ 


[M yt?" —Qyz (My ]2 = — — Bye o 
Drv! V Das Des’ Do'y’ V Des Do's’ 

In the case of correlation of two second order random variables, a linear 
regression function can always be found [8, 5]. Similarly, the conditional 
regression functions in the case of three second order random variables can 
always be taken as linear. If we take as the form of the regression function 
of W on U and V the form My” = aU; + bV; + cU:V; + d, wherea, b, c,d 
are constants to be determined by direct substitution for U; and V; from the 
distributions of X and Y, it is seen that linearity of all total and conditional 








312 CLYDE A. BRIDGER 
regression functions is preserved. By total regression function, we mean the 
regression of W on U or W on V. 

Now consider the problem of finding a, b, c, d. The direct substitution pro- 
vides us with four linearly dependent equations in four unknowns. Linear 
combinations reduce the set to three, from which the relationship d = —cry. 


is obtained. By building up the various terms in the equations through dividing 





by the necessary values of p, the parameters r can be made to appear. Further 
combinations now reduce the set to the following three: 
2 
Mu = ara. + brie. + c(t. — rin.) 
ray = ary. +b + cre. 
ry.1 = a + bry. + C1’ 9}. 
The solution gives 
Tat = Fea. F as a. — Ti. Te- , ” 
a= —- —c=a —a'c 
= ry." 1 —- ru. 
“a = Ta F04 im. = Ta. fe. 
b = —— —_¢=b’ — b’c 
: %," 1 — ry." 
2 
e= (1 — rn (im — a’ry. — b’ra.) + A, where 
1 ry. r91 , 
A= (ry. 1 r'y2. 
. 
ro. ry rie. — Te. 


The regression function becomes 
My” = aU; + UV; — com + aU; + 0'V; — UV}. 
isa plane. Examination of the characteristics of 71; shows that generally ¢ can- 
not be zero. The vanishing of c implies that special relations must exist between 
Dijk ANA Pi; , Dik, Dix - 

Two constants of considerable importance in the theory of correlation are the 
multiple correlation coefficient and the multiple correlation ratio. For the 
regression of W on U and V, the former is defined as Ry” = a’ry.. + b’r.4, and the 
latter as n_-2 = >, >, pid{Mw°”’ |. For planar regression, the difference 7__2 

t 7 


If c = 0 the surface 


— Ry” must vanish. For others, the difference takes on values characteristic 
of the regression function. To find the value it takes for our case, we set up 
the value of 7__s from the regression function just given and subtract Ry.”. 

By direct substitution, we have 7-2 — Ru? = D> >> pifal; + OV; — cUV; 


i j 
— fm) — en. — ¥ry. Since ie a pijU; =1, > 7 pi(UV;) = 
r* 7 & 


etc., we find rather easily that 


roe. , 


9 9 9 
53> Ry = C (Yee. = ry. ) <= a’ roy. = bry. . 








owes 


REGRESSION FUNCTIONS FOR SECOND ORDER RANDOM VARIABLES 313 
We can also obtain the same value of n7_2 — Ry/ by direct substitution for 
the four values of My‘ in n__» and subtracting Ry”. To actually obtain this 
is a long laborious process complicated by the fact that so many alternate forms 
for the answer are possible, of which only one is comparable with the value 
previously found. The general procedure is first to set up from the definition 
the expression K = p.py n_-2 = 


p (=) + Dey’ (?= “} + p (*= on “) +p (* ——= =} 
™ Pry = Pry’ — Pz'y = Pz'y' , 


yu 


Then we build up each square by addition and subtraction so that it will con- 
tain a 6,,. term. At the close of the process, we convert the whole expression 
into the parameters r by dividing through by p.p.(pzp2pyp,’) and substituting 
from the list of representative forms given at the beginning of the paper. A 
matter of rearrangement now gives the same result as before. 

From the symmetry involved, we can say that, in the case of the correlation 
of three second order random variables, the function representing the regression 
of one on the other two has an equation in normal coordinates of the form 
My,” =aU;, + bV; — cUiV; — erm, where a, b, and c satisfy equations of 
type 


9 


Mn = Ara. + bre. + (ree. ss rn.) 
rap = ary. +56 + cry. 
ny.1=— a + bris. + C19}. 


Strate Division or Pusiic HEALTH, 
Boise, Ipano. 


BIBLIOGRAPHY 


[1] G. Upny Yute. AnJ/ntroduction lo the Theory of Statistics. London: Charles Griffin & 
Co., Ltd., 1922. 6th Ed. 

|2}) C. V. L. Coaruier. ‘Om korrelation mellan eganskaper inom den homograda statis- 
tiken.’’ Svenska Aktuarieféreningens Tidskrift. Vol. I (1914), pp. 21-35. 

[3] S. D. WicksELL. ‘‘Some theorems in the theory of probability, with special reference 
to their importance in the theory of homograde correlation.’’ Svenska Aktua- 
rieforeningens Tidskrift. Vol. III (1916), pp. 165-213. 

[4] S. D. WicksELL. ‘On the correlation of acting probabilities.” Skandinavisk Aktua- 
rietidskrift. Vol. I (1918), pp. 98-135. 

[5] A. A. Tscuuprow. Grundbegriffe und Grundprobleme der Korrelationstheorie. Leipzig: 
B. G. Teubner, 1925. 

[6] A. A. Tscnuprow. (Translation into English by L. Isserlis.) ‘‘The Mathematical 
Theory of the Statistical Methods Employed in the Study of Correlation in the 
Case of Three Variables.’’ Transactions of the Cambridge Philosophical Society. 


Vol. XXIII, no. 12 (1928), pp. 337-382. 





JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


SEPTEMBER, 1938 - $1.50 PER Copy - $6.00 PER ANNUM - VOL. 33 - No. 203 


CONTENTS 


ARTICLES 
How to Study the Social Aspects of the Depression . . . . Epwin B. WILSON 
The x?-Test of Significance . . ...... =.=. +... . THORNTON C. Fry 


Some Difficulties of Interpretation Encountered in the Application of the Chi-Square 
a ee ee ee 


Further Interpretations of the Chi-Square Test. . . . . . .Burron H. Camp 
Some Thoughts on Curve Fitting and the Chi Test . . . W. Epwarps DEMING 
Punched Card Technique for the Correction of Bias in Sampling . C. W. VickERY 
The Problem of the Stock Price Index Number. . . . . . Francis McINTYRE 


Problems in the Measurement of the Physical Volume of Output, by Industries 
SOLOMON FABRICANT 


Statistical News and Notes 


Book Reviews . 


Address inquiries and orders for subscriptions and back numbers to Frederick F. Stephan, 
Secretary, American Statistical Association, 722 Woodward Building, Washington, D. C. 














a 





7 





THE INSTITUTE OF MATHEMATICAL STATISTICS 
(Organized September 12, 1935) 


OFFICERS FOR 1938 
President: 


B. H. Camp, Wesleyan University, Middletown, Conn. 


Vice-Presidents: 
P. R. Riper, Washington University, St. Louis, Mo. 
S. S. Wixxs, Princeton University, Princeton, N. J. 


Secretary-Treasurer: 
A. T. Crate, University of lowa, Iowa City, Iowa. 


The purpose of the Institute of Mathematical Statistics is to stimulate 
research in the mathematical theory of statistics and to promote coéperation 
between the field of pure research and the fields of application. 


Membership dues including subscription to the ANNALS OF MATHEMATICAL 
Sratistics are $5.00 per year. The dues and all correspondence regarding 


membership in the Institute should be sent to the Secretary-Treasurer of the 
Institute. 





